Closed
Description
In the current implementation, the calling python function has to be specified in the repo:
dvcsummon.yaml
:
objects:
- name: sea_ice
summon:
type: python
call: myfunc
deps: ['sea_ice.csv']
And the function has to be in the repo. Otherwise it does not work:
>>> import dvc.api
>>> def myfunc(): print("hello myfunc!!!")
>>> df = dvc.api.summon('sea_ice', 'https://github.com/iterative/df_sea_ice_no_header', rev='dmpetrov-patch-1')
...
ModuleNotFoundError: No module named 'myfunc'
We need to have an ability for the user to call his own code and specify the function name on his own code. Like this:
>>> import dvc.api
>>> def myfunc(): print("hello myfunc!!!")
>>> df = dvc.api.summon('sea_ice', 'https://github.com/iterative/df_sea_ice_no_header', rev='dmpetrov-patch-1', call=myfunc)
hello myfunc!!!
Or like in the initial issue #2719 description:
name: some_dataframe
call: pandas.read_csv
- Define function in user code.
- Specify function name in user's code (
dvc.api.summon()
not indvcsummon.yaml
) - Pass parameters (and probably parsed yaml object)