-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: added string processing comparison with R #16502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -530,6 +530,103 @@ For more details and examples see :ref:`categorical introduction <categorical>` | |
:ref:`differences to R's factor <categorical.rfactor>`. | ||
|
||
|
||
String Processing | ||
----------------- | ||
|
||
Length | ||
~~~~~~ | ||
|
||
R determines the length of a character string with the ``nchar`` function. | ||
``nchar`` includes leading and trailing blanks. Use ``nchar`` and ``trimws`` | ||
to exclude leading and trailing blanks. | ||
|
||
.. code-block:: none | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a R highlter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. http://pygments.org/docs/lexers/#lexers-for-the-r-s-languages
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I realize that these produce the same output, so we don't actually show the output (I think) elsewhere, so maybe that is ok (though obviously the code formatting would be nice) |
||
|
||
df <- data.frame(color = c('red', ' blue', 'green ', ' yellow ')) | ||
nchar(as.character(df$color)) | ||
nchar(trimws(as.character(df$color))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be nice to show output here (IOW run the R code and show the output too if you can) |
||
|
||
Python determines the length of a character string with the ``len`` function. | ||
``len`` includes leading and trailing blanks. Use ``len`` and ``strip`` | ||
to exclude leading and trailing blanks. | ||
|
||
.. code-block:: none | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -> ipython:: python (and for all running pandas code) |
||
|
||
df = pd.DataFrame({'color': ['red', ' blue', 'green ', ' yellow ']}) | ||
df['color'].str.len() | ||
df['color'].str.strip().str.len() | ||
|
||
|
||
Find Position | ||
~~~~~~~~~~~~~ | ||
|
||
R determines the position of a character in a string with the | ||
``regexpr`` function. ``regexpr`` takes the string defined by | ||
the first argument and searches for the first position of the substring | ||
you supply as the second argument. | ||
|
||
.. code-block:: none | ||
|
||
df <- data.frame(sex = c('MALE', 'FEMALE')) | ||
pos = regexpr("ALE", df$sex) | ||
pos[1:2] | ||
|
||
Python determines the position of a character in a string with the | ||
``find`` function. ``find`` searches for the first position of the | ||
substring. If the substring is found, the function returns its | ||
position. Keep in mind that Python indexes are zero based whereas | ||
R indexes are 1 based. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simpler: "Keep in mind that Python 0-indexes, whereas R 1-indexes" |
||
|
||
.. code-block:: none | ||
|
||
df = pd.DataFrame({'sex': ['MALE', 'FEMALE']}) | ||
df['sex'].str.find("ALE") | ||
|
||
Substring | ||
~~~~~~~~~ | ||
|
||
R extracts a substring from a string based on its position | ||
with the ``substr`` function. | ||
|
||
.. code-block:: none | ||
|
||
df <- data.frame(sex = c('MALE', 'FEMALE')) | ||
substr(df$sex, 1, 1) | ||
|
||
In Python, you can use ``[]`` notation to extract a substring | ||
from a string by position locations. Keep in mind that Python | ||
indexes are zero-based. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Slightly simpler: "Keep in mind that Python 0-indexes" |
||
|
||
.. code-block:: none | ||
|
||
df = pd.DataFrame({'sex': ['MALE', 'FEMALE']}) | ||
df['sex'].str[0:1] | ||
|
||
|
||
Upcase and Lowcase | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
The R ``toupper`` and ``tolower`` functions change the case of the | ||
character string. | ||
|
||
.. code-block:: none | ||
|
||
df <- data.frame(name = c('Johnny Bravo', 'Alex Mack')) | ||
toupper(df$name) | ||
tolower(df$name) | ||
|
||
The equivalent Python functions are ``upper`` and ``lower``. | ||
In addition, Python's ``title`` function changes the string to | ||
proper case. | ||
|
||
.. code-block:: none | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment above |
||
|
||
df = pd.DataFrame({'name': ['Johnny Bravo', 'Alex Mack']}) | ||
df['name'].str.upper() | ||
df['name'].str.lower() | ||
df['name'].str.title() | ||
|
||
|
||
.. |c| replace:: ``c`` | ||
.. _c: http://stat.ethz.ch/R-manual/R-patched/library/base/html/c.html | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a section tag here, like:
_compare_with_r.string
(actually if you can add them some of the sub-sections would be great). you put right after the sub-section label.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you aren't familiar with sphinx, you need to start the line with a
..
http://www.sphinx-doc.org/en/stable/markup/inline.html#cross-referencing-arbitrary-locations