-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix solution in Link Checker in Concurrency Morning exercises #904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mgeisler This is a lot of code, and I'm pretty new at Rust. I won't be upset if you merge and change / rewrite, if that's easier than writing up a lot of feedback 😄 |
855c5f9
to
ab46ed5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I added a few rustdoc comments. There are some interesting things to discuss here, such as using a Mutex to make a "mpmc" channel.
@qwandor, ptal |
@djmitche Thank you very much for the comment improvements! In case it helps, I came up with the Arc+Mutex while trying to follow the exercise instructions at comprehensive-rust/src/exercises/concurrency/link-checker.md Lines 80 to 81 in 4413b6a
I also considered the I wrote my thoughts above hoping they help us come up with the best possible prompt text for future students. |
Hm, those instructions might be too limiting. I could see solving this with a condition variable or a semaphore, too. In general, we've tried to make the exercises fairly open-ended. Of course, the solution will only pick one way to do it. The way you've chosen is a good one! |
Thanks a lot for putting this up — people have been writing me asking for a solution 😬 Infact, we have #816 open for this 😄 Fun fact, to test this, I downloaded the code and tried to crawl https://google.github.io/comprehensive-rust/ instead — I was curious if we have broken links there! That soon resulted in
errors and a bit later, I was blocked from posting this comment... luckily, the rate limit was lifted a few minutes later. I noticed one thing:
It would be great to clean the URLs of any query string and anchor texts before inserting them into |
Another thing I've noticed when doing this exercise in class: it's super hard and I wish I had provided much more scaffolding for people to use. Your PR is already a great improvement, but I feel we should
I'm an old-school web developer, but I've had people in my classes who've never done any web development whatsoever. HTTP status codes are foreign to them, HTML is foreign to them, URL fragments are foreign to them. Adding more scaffolding would probably help a lot there to make the exercise useful people from more backgrounds. After this PR is merged, I would also like to include a diagram or two which shows the intended interaction between the crawler threads, the channels, and the main thread. I've had people in my classes who took the old link extraction logic and ran that in a multi-threaded way — when that is a purely local and super fast operation. So we would need more detail explaining that it is the IO operations that are slow and which we seek to parallelize here. |
@mgeisler I fully agree with your thoughts above! Would you be OK if I copied your thoughts (and my reply) into a separate Issue? I see this PR as making a first step in what is hopefully the right direction. This PR brings the solution in line with the requirements outlined in the problem, with minimal scaffolding changes. I think that, once we agree that there are no obvious simplifications to the solution, we can conclude that the problem is too difficult. Once we're there, we can think of possible fixes. I would be most interested in prototyping a scaffolding change where we give students a WDYT? |
My extremely unedited solution with a single channel and no mutex |
Yes, definitely!
I completely agree, we should merge this now since it's a huge improvement on what we have.
My experience is that people find this to be a hard exercise — most find it too hard for the 30-40 minutes we normally have at this point in the course. I think less than 25% of the people in a class manage to write a working solution in that time. |
Hi @jmpfar, thanks for posting this! Your solution is not bad: it delegates the I/O-heavy operations to new threads and does the CPU-intensive work in the main thread. Many people don't get to that stage. A small problem with your solution is that you have an unbounded number of threads: if a page has 100 outgoing links, you spawn 100 threads. To solve that, you more or less have to create a multi-producer-multi-consumer abstraction: you need this to have a bounded number of threads read from a single channel. |
@mgeisler ah, you are right 😬 Anyway from my side I can attest this was the hardest exercise in the course (minus android and baremetal which I didn't do) and it took me significant time (but was fun all in all :) |
Thanks, that matches my impression too. I'm glad you liked it despite it being hard 😄 Perhaps it would be good to call it a "take home exercise" instead since it's really a small projects and not a real exercise with a clear answer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this merged since it's an excellent start for further updates. @fw-immunant has put up some PRs to restructure the course so he might have further input down the line.
Hi @pwnall, the build fails because a code block is not marked with
When we run |
Thank you for decoding the error for me! Let me figure out how to modify this chain to make the build green. I haven't updated a branch with someone else's commits before 😄 |
This change fixes the following issues with the current solution: 1. It is not listed on the "solutions" page. 2. It is not multi-threaded and does not use channels.
674e813
to
104ab9c
Compare
@mgeisler Thank you again for the tips! The build is green now :) |
Yeah, looks good! Just for your information, we squash-merge all PRs in this repository. This makes it a bit easier for people since they can just add more and more commits to fix review comments. The commits all disappear from the main history when the branch is squash-merged into |
A small note: it feels terribly to have large code snippets that say |
…#904) * Fix solution in Link Checker in Concurrency Morning exercises. This change fixes the following issues with the current solution: 1. It is not listed on the "solutions" page. 2. It is not multi-threaded and does not use channels. --------- Co-authored-by: Dustin J. Mitchell <[email protected]>
This change fixes the following issues with the current solution:
Fixes #816.