Skip to content

Develop an Introductory CWL tutorial #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tobyhodges opened this issue Aug 16, 2019 · 25 comments
Closed

Develop an Introductory CWL tutorial #160

tobyhodges opened this issue Aug 16, 2019 · 25 comments
Assignees

Comments

@tobyhodges
Copy link
Contributor

tl;dr: We need a tutorial/lesson aimed at teaching CWL to complete beginners. Are you interested in helping to make this happen?

Based on discussions with various people in the CWL user/developer community, and my own experience learning and implementing the language, I believe that what we need is two distinct presentations of information for non-experts:

  1. material designed specifically to teach CWL to novices (I'll refer to this as a lesson from now on), and
  2. documentation designed to be used as a reference for those who know enough of the vocabulary and the approach to thinking about describing tools and workflows (these people are sometimes called competent practitioners - see figure below)

stages of learning
(image from Carpentries Instructor Training)

To use the analogy that I gave to @mr-c earlier today: the following resources are available to Python users, catering to people at different stages of learning the language:

In it's current form, the CWL User Guide falls somewhere between the first two above: not quite thorough enough on the basics, lacking advice on how to think about workflows, and with too steep a learning curve to be friendly for total beginners; not detailed/exhaustive enough for those who want to move beyond the basics.

I propose that we work together to develop a Carpentries-style lesson, aimed at teaching enough CWL and "workflow thinking" to move learners from "novice" to "competent practitioner" stage. This would take the same approach used by Software or Data Carpentry to develop material with a manageable learning curve, containing plenty of exercises, with clear and achievable learning objectives, and focusing on the concepts that novices find difficult about learning CWL. We might even consider basing (parts of) it on the existing Software Carpentry Make lesson - or we might decide to start from scratch.

Note: I am absolutely not a CWL expert. What I can offer:

  1. Guidance on developing good lesson material, writing good exercises, learning objectives, etc
  2. Explanation and support with developing teaching material using the Carpentries lesson template
  3. The opportunity for contributors to come to EMBL in 2020 to help teach this lesson as a workshop for bioinformaticians/computational biologists who want to start using CWL to describe their analyses (This offer is probably limited to travel within Europe only, I'm afraid)

What I can't offer:

  1. Loads of time to work on writing material myself
  2. Expert knowledge of CWL
  3. Extensive experience of using CWL

Questions that I'm asking here:

  1. Do you think this is a good idea?
  2. What other resources already exist out there that teach CWL?
  3. What were the hardest things for you when learning CWL? (Particularly interested to hear from anyone who wasn't already a computing/workflows expert here.)
  4. Would you be interested in helping develop this material?
@tobyhodges tobyhodges self-assigned this Aug 16, 2019
@fpsom
Copy link

fpsom commented Aug 17, 2019

I have very limited experience with CWL (mostly dabbling around in toy workflows), but I can confidently answer #1 (Absolutely YES) and #4 (I'd love to as much as I can). For #2 my response would be the mental transition from a "script-based" model, to a descriptive/abstract one.

@pvanheus
Copy link

On 1. I think it is but using what technology? Do you expect people to hand-write CWL? Or use something like Rabix Composer?
2. I learned by reading the User Guide and the specific (and other examples of CWL). So in my experience there is not a lot out there.
3. Syntax threw me. Understanding in vs input. Also some technical details like secondary files, value from statements, and expressions vs inline javascript. I still feel I don't understand the Directory type.
4. Yep, in my limited free time.

@ksebby
Copy link

ksebby commented Aug 19, 2019

  1. Yes, I think that anything that makes documentation simpler and provides tutorials and examples is a good idea.
  2. I used these:
  1. Very similar to @pvanheus . Also, where there are multiple ways to specify things (map vs. array), syntax for some requirements (can't think of specific example now) took a bit of figuring out.

  2. Won't have much time to develop material but would be happy to try it out and edit.

@skanwal
Copy link

skanwal commented Aug 20, 2019

Thanks for raising this issue. I would categorise myself as somewhat in the middle on the experience axis (competent but look-up stuff ).

  1. A big yes! Anything to help make learning CWL easier is/will/should be greatly appreciated.
  2. I personally refer to http://www.commonwl.org/user_guide/ , CWL-biostars and gitter CWL channel to ask for help or clarifications.
  3. Hmm, I think it took me a while to understand connection between inputs/outputs in the main workflow VS tool files.
  4. As much as I can (depending on the work schedule and deadlines), either with writing or testing.

@denis-yuen
Copy link
Member

  1. Seems like a good idea
  2. (ditto from above)
  3. What things are standardized in CWL and what things are left up to implementers, what things in cwltool are guaranteed to be in other engines/what things aren't, and yeah some of the syntax from above answers
  4. Realistically, I probably don't have bandwidth to write.
    I'd be happy to put some eyes on reviewing and do have the ability to assigning developers new to CWL to test though

@smoe
Copy link

smoe commented Aug 20, 2019

Please go for it.

I am not ultimately convinced, though, that novices should need to see the CWL itself. They should rather feel comfortable to use it. So, beyond some "hello, this is me, CWL, but you don't really need to care" I see introductions on sets of readily usable CWL-implemented workflows that are available for the community to run. The tutorial could then explain these workflows. There shall be quick ones that run on a laptop. And others that need queueing systems or that run in the cloud somewhere. The message to get across would be that for every execution environment there is something to execute the CWL workflow and the user does not need to change the CWL bits at all, and the tutorial shows how it is done. So, by accepting the CWL in your life you would have all you need to run current routine workflows in what Bioinformatics Service Groups are addressing all over the world. And there are no limits to compute time either thanks to a fairly smooth transition into the cloud.

The advanced user will then adjust existing workflows.

The expert sends patches to this github project?

@tobyhodges
Copy link
Contributor Author

tobyhodges commented Aug 21, 2019

Thanks everyone for the feedback so far - very useful indeed! I'll keep this discussion open for a while longer to allow others to contribute and will also lead one of the weekly CWL meetings, on 10 Sep 2019, when we will discuss the subject further.

[edit: link to receive invitation to the meeting mentioned above: https://groups.google.com/forum/#!forum/common-workflow-language-videochat-invites ]

@golharam
Copy link

  1. Yes - excellent idea.

  2. none. The cwl website is what I use, primarily. I just posted by first question to the google group because I can't find an answer for one of my use-cases.

  3. Docs, in some places, are too technical especially the getting started section. They also address very simple basic examples. As I'm converting a workflow, I'm trying to follow the docs but also apply what I'm reading to my existing workflow. This is proving harder than I thought.

  4. Yes, I would.

@tobyhodges
Copy link
Contributor Author

@golharam thank's for your enthusiastic response. Are you able to elaborate on your answer to 3? What specifically did you find too technical in this User Guide? While converting your workflow, what stage(s) did you find particularly difficult? It's these kinds of "pain points" that I would hope to specifically address in the tutorial.

@golharam
Copy link

@tobyhodges -

  1. Minor point, but on https://www.commonwl.org/user_guide/02-1st-example/index.html, cwl-runner is actually cwltool with the reference implementation. I don't have a cwl-runner at this time.
  2. https://www.commonwl.org/user_guide/06-params/index.html, the section "Where are parameter references allowed?" seems to in-depth for a getting starting guide. Maybe this should be a link to a reference guide instead.
  3. https://www.commonwl.org/user_guide/05-stdout/index.html, an obvious question at this point, ot me, is how to rename the output file from output.txt to something else? Presumably, I've got enough knowledge know how to do some basic things. I'm going to start exploring with my own stuff.

Maybe I can download the docs and mark them up with my notes somehow and send them back. Would that help?

@tobyhodges
Copy link
Contributor Author

Sounds great - thank you @golharam. There's also no reason why we can't address some of these points with fixes to this UG itself - pull requests to this repository are welcome :)

@DiDeoxy
Copy link

DiDeoxy commented Aug 27, 2019

  1. Do you think this is a good idea?

I really do, I'm shocked it doesn't exist already its been several years since release. Are we not supposed to be writing CWL by hand?

  1. What other resources already exist out there that teach CWL?

User guide, technical documentation, github repositories, gitter, discourse, biostars. None are a comprehensive introduction though for beginners. It's really tough to learn CWL currently.

  1. What were the hardest things for you when learning CWL? (Particularly interested to hear from anyone who wasn't already a computing/workflows expert here.)

Dependent inputs in workflows.

  1. Would you be interested in helping develop this material?

I could contribute questions and maybe some writing of explanations (would likely need review though)

@mr-c mr-c pinned this issue Sep 1, 2019
@stain
Copy link
Member

stain commented Sep 3, 2019

Sign me up! We want to work with this also in BioExcel context, in particular for developing virtual training on CWL together with EBI.

@ttubb
Copy link

ttubb commented Sep 4, 2019

Developing CWL workflows was a significant part of what i did for the last few months. With a biology/biotechnology background, getting started was pretty rough. The user guide is really lacking, it took about a day of experimenting until i faced problems which required reading the specifications. And i feel those are really not ideal for somebody trying to grasp the basics of developing workflows.

I'm defenitely not an expert, but i've written quite a few tool wrappers / workflows by now. So there is probably some knowledge i could impart to new users.

To answer your questions individually:

  1. Do you think this is a good idea?
  • Yes! A comprehensive tutorial would have saved me from lots of guessing and trial&error. I also see some questions pop up on biostars again and again, so material regarding those would probably help a lot of folk.
  1. What other resources already exist out there that teach CWL?
  • I only know of the user guide and specifications at commonwl.org. A slew of old biostars threads also helped me with learning.
  1. What were the hardest things for you when learning CWL? (Particularly interested to hear from anyone who wasn't already a computing/workflows expert here.)
  • How to organize output of tools/workflows (rename files, put them in folders, ...)
  • The fact that CWL uses symlinks and runs in temp directories. I really struggled making some tools work because of this.
  • What parameters i need to pass to cwltool to make my workflows function properly (--enable-ext, --eval-timeout, --preserve-environment, etc.)
  • Containerization/Docker (not the scope of this, but might be adressed just by linking to some exissting tutorials)
  1. Would you be interested in helping develop this material?
  • Yes. I will probably have some spare time to work on this during the next 2 or 3 months.

@ttubb
Copy link

ttubb commented Sep 15, 2019

Thanks everyone for the feedback so far - very useful indeed! I'll keep this discussion open for a while longer to allow others to contribute and will also lead one of the weekly CWL meetings, on 10 Sep 2019, when we will discuss the subject further.

[edit: link to receive invitation to the meeting mentioned above: https://groups.google.com/forum/#!forum/common-workflow-language-videochat-invites ]

Since the Sep 10th meeting was canceled, can you provide us with a new date for the discussion?

@tobyhodges
Copy link
Contributor Author

Thanks for your thoughful response, @ttubb. I'll be back from parental leave in early-mid October and hope to be able to host the rescheduled call soon after.

@mr-c
Copy link
Member

mr-c commented Oct 22, 2019

The call has been rescheduled to Tuesday, October 29th at 14:30 UTC via https://meet.jit.si/cwl

Dial in information:
To join by phone instead, tap this: +1.512.402.2718,,3877533315#
Looking for a different dial-in number?
See meeting dial-in numbers: https://meet.jit.si/static/dialInInfo.html?room=cwl
If also dialing-in through a room phone, join without connecting to audio: https://meet.jit.si/cwl#config.startSilent=true

I am excited to hear from y'all!

@mr-c
Copy link
Member

mr-c commented Oct 29, 2019

Reminder: We are starting in less than 5 minutes

https://meet.jit.si/cwl

Dial in information:
To join by phone instead, tap this: +1.512.402.2718,,3877533315#
Looking for a different dial-in number?
See meeting dial-in numbers: https://meet.jit.si/static/dialInInfo.html?room=cwl
If also dialing-in through a room phone, join without connecting to audio: https://meet.jit.si/cwl#config.startSilent=true

@dleehr
Copy link

dleehr commented Oct 29, 2019

Thanks @tobyhodges, great to hear about this effort. I think it's a great idea and would be interested in helping develop the material. I've taught data carpentry and software carpentry workshops and expect to be going through the instructor training at some point in the near future. I'm excited to see workflow lessons in the carpentries mold.

One of the hardest things for me learning CWL was writing workflows to connect tools that didn't fit together cleanly. I wrote a lot of "glue" CommandLineTools and built containers to do those things initially, but this was cumbersome.

Eventually I found ExpressionTools and InitialWorkDirRequirement to place a script into the tool description as a cleaner way to do this. That's been working well but I feel these are advanced techniques and that there's no easy/obvious solution for novices. Also, when faced with such a challenge, it's very tempting to fall back to just writing one big script as your "workflow" and mix in all the glue code right there.

@tetron
Copy link
Member

tetron commented Oct 29, 2019

I'd like to contribute.

I came across a model for documentation that divides it up into "tutorials", "how-to guides", "explanation" and "technical reference": https://www.divio.com/blog/documentation/

We're specifically focused on the novice user for this project. A separate project could look at organizing/creating how-to guides and explanations that help the competent practitioner get work done.

@ttubb
Copy link

ttubb commented Oct 29, 2019

A thought after todays call: Some learners might only be interested in how to modify existing workflows, not write entire CommandLineTools or the like. It would be nice if those users quickly got to a point which enables them to do that, without having to to go through uneccassary lessons. I am not sure how exactly this can be adressed, but it might be an aspect to keep in mind.

To expand on why i it seems important to me: For what i did with CWL so far, a workflow of pretty similar steps has to be carried out, but a variety of tools is available for each step. I have built wrappers (+workflows) for the most popular/useful of these tools. All manner of different combinations of these wrappers are sensible, depending on the use case. I can share my wrappers along with some workflows as examples, but all potential users would have to modify these slightly to fit their data / analysis hardware / computational resources and the problem they work on.

@tetron
Copy link
Member

tetron commented Oct 29, 2019

A thought after todays call: Some learners might only be interested in how to modify existing workflows, not write entire CommandLineTools or the like. It would be nice if those users quickly got to a point which enables them to do that, without having to to go through uneccassary lessons. I am not sure how exactly this can be adressed, but it might be an aspect to keep in mind.

That's a great point. Even among learners, some people might only need to know enough to edit an existing workflow, or connect existing tools. On the other hand, other learners may not have access to existing CWL for their research area and would necessarily need to learn how to write tools and workflows from scratch.

@tobyhodges
Copy link
Contributor Author

I've sent out invitations to collaborate on https://github.com/common-workflow-lab/cwl-novice-tutorial, the repository where work on this tutorial will begin. If anyone else reading this would like to be involved in developing this material, please post here or get in touch with me by some other channel and I'll add you too.

Once we have something to review, I'll post back here so that others can take a look.

@cwl-bot
Copy link

cwl-bot commented Oct 6, 2020

This issue has been mentioned on Common Workflow Language Discourse. There might be relevant details there:

https://cwl.discourse.group/t/developing-personas-and-pathways-for-a-diverse-and-inclusive-cwl/214/1

@kinow kinow unpinned this issue Jul 7, 2022
@kinow kinow pinned this issue Jul 7, 2022
@mr-c
Copy link
Member

mr-c commented Aug 17, 2022

Success! See https://github.com/carpentries-incubator/cwl-novice-tutorial & https://carpentries-incubator.github.io/cwl-novice-tutorial/

@mr-c mr-c closed this as completed Aug 17, 2022
@kinow kinow unpinned this issue Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests