[WIP] NIST strong line retrieval #29

simontorres · 2018-06-19T22:13:50Z

This PR is an example of how to load NIST data into pandas DataFrame.
I'm not sure if this qualifies for merging but should be useful for merging. Or give me useful feedback and I can improve it.
See #28

There is at least another option to do the same using BeautifulSoup but I used HTMLParser which is more low level I think.
It assumes two-character names for the chemical elements

… more accurately represents the format.

…taFrame

…rns a pandas DataFrame in the same format as the webcrawler

tepickering · 2018-06-21T16:12:08Z

looks like this and #30 cover some of the same ground. a few comments:

i think we should pick a convention for returning wavelength/intensity and be consistent. i'm fairly agnostic whether it's an astropy Table, a DataFrame, or a tuple of numpy arrays, but all methods that return them should do so in the same way.
if we use pandas, it needs to be added to the conda dependencies in .travis.yml.
BeautifulSoup is a cleaner, more concise way of parsing HTML, but it does add an extra dependency vs. the built-in HTMLParser. OTOH, it's available via the standard conda channel so it's not that onerous to install and it is widely used. should look at what, if any, redundancies there are between parsing the cgi interface in Created method to retrieve the wavelength lines from NIST. #30 and the html tables here and generalize as much as possible.
new code should include tests that cover it.

simontorres · 2018-06-21T22:15:59Z

They do indeed roughly the same, my idea was to provide a "proof of concept" (if it can be called that) in order to contribute in the discussion. I agree that BeautifulSoup is cleaner and I must admit I have not have use it.

Regarding the convention for retrieving the data I would definitely go with pandas.DataFrame for its flexibility, for instance, when filtering the data.

I have created a quick example.

import pandas as pd

data_neon = {'rel_int': [100,
                         80,
                         80,
                         90,
                         90],
             'wavelength' : [2809.485,
                             2906.592,
                             2906.816,
                             2910.061,
                             2910.408],
             'spectrum' : ['Ne II',
                           'Ne II',
                           'Ne II',
                           'Ne II',
                           'Ne II'],
             'reference' : ['P71',
	                    'P71',
	                    'P71',
	                    'P71',
	                    'P71']}


df = pd.DataFrame(data=data_neon, columns=['rel_int',
                                           'wavelength',
                                           'spectrum',
                                           'reference'])
# the DataFrame object
print("The DataFrame Object")
print(df)

The DataFrame Object
   rel_int  wavelength spectrum reference
0      100    2809.485    Ne II       P71
1       80    2906.592    Ne II       P71
2       80    2906.816    Ne II       P71
3       90    2910.061    Ne II       P71
4       90    2910.408    Ne II       P71

# selecting the three most intense
print("Selecting the three most intense")

three_most_intense = df.sort_values('rel_int', ascending=False)
three_most_intense = three_most_intense[:3].sort_values('wavelength')
print(three_most_intense)

Selecting the three most intense
   rel_int  wavelength spectrum reference
0      100    2809.485    Ne II       P71
3       90    2910.061    Ne II       P71
4       90    2910.408    Ne II       P71

print('select between 2900 and 2910')

print(df[((df.wavelength > 2900) & (df.wavelength < 2910))])

select between 2900 and 2910
   rel_int  wavelength spectrum reference
1       80    2906.592    Ne II       P71
2       80    2906.816    Ne II       P71

tepickering · 2018-06-22T17:40:28Z

i'm a big pandas fan, but i'll point out one big disadvantage that i just realized: lack of units support. even in your example, it's not clear if the wavelengths are in Å or nm.

it also doesn't appear that units support is coming to pandas any time soon: pandas-dev/pandas#15698

to avoid headaches down the road, i think it's important that at least the wavelengths returned by line list utilities contain explicit metadata describing the wavelength units. this leaves either a Quantity array or a QTable. using pandas within the methods/functions is fine and very handy, though, as you show.

bsipocz · 2018-06-22T17:46:55Z

hmm, for the sake of compatibility with the rest of the stack, imo Table/QTable should be preferred over pandas, unless of course a crucial functionality of the latter is used that is not available with the astropy framework.

simontorres · 2018-06-25T15:35:46Z

I guess I should be closing this as well, based on the conclusion of #30

Simon Torres added 4 commits June 19, 2018 18:02

changed nist strong line files' extension from '.txt' to '.csv' which…

bfe3381

… more accurately represents the format.

created utils package

7d7200c

added NIST strong lines web crawler/scrapper that returns a pandas Da…

d3fbf4d

…taFrame

created an example function to load nist strong lines files that retu…

e1b7d54

…rns a pandas DataFrame in the same format as the webcrawler

tepickering mentioned this pull request Jun 22, 2018

Created method to retrieve the wavelength lines from NIST. #30

Closed

simontorres closed this Jun 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] NIST strong line retrieval #29

[WIP] NIST strong line retrieval #29

Uh oh!

simontorres commented Jun 19, 2018

Uh oh!

tepickering commented Jun 21, 2018

Uh oh!

simontorres commented Jun 21, 2018

Uh oh!

tepickering commented Jun 22, 2018

Uh oh!

bsipocz commented Jun 22, 2018

Uh oh!

simontorres commented Jun 25, 2018

Uh oh!

Uh oh!

Uh oh!

[WIP] NIST strong line retrieval #29

[WIP] NIST strong line retrieval #29

Uh oh!

Conversation

simontorres commented Jun 19, 2018

Uh oh!

tepickering commented Jun 21, 2018

Uh oh!

simontorres commented Jun 21, 2018

Uh oh!

tepickering commented Jun 22, 2018

Uh oh!

bsipocz commented Jun 22, 2018

Uh oh!

simontorres commented Jun 25, 2018

Uh oh!

Uh oh!