-
Notifications
You must be signed in to change notification settings - Fork 214
Randomly hang once a day #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've had this happen as well, though not as frequently as once a day. No indication of error in nextdns logs, nextdns status says running, but DNS does not resolve. I'm using an EdgeRouter X SFP v2.0.8-hotfix.1 and keeping current with nextdns release versions. |
Mine has that issue as well after enabling discovery services. For example, this morning it wasn't resolving any domains and there was no error message except the last log was mDNS message. Sometimes it goes kernel panic and log console would go haywire then restarts itself just fine. I will just disable discovery and go to the first config where there were no issues. After all, it can't even discover host names in etc/hosts and reverse lookup local IPs with conditional forwarder set. |
It does discover IPs in /etc/hosts. Doesn’t it work for you? |
I reseted the OpenWrt, installed the latest NextDNS CLI, set it to listen on 192.168.1.253:5342, set dnsmasq forwarder to it, then populated the /etc/hosts and it worked. As far as hang issue goes, lets see if latest version does this... |
Still hangs without giving any error. Cant resolve anything until I restart nextdns and dnsmasq. Im gonna make it automatic with scheduler, restart those two once a day and see how it goes. edit: openwrt 19.07.2 |
It's dnsmasq that's hanging! Just found out that. Which is also why we see no NextDNS errors in the log, it just waits to listen on and on. |
I put a script runs every 5 min. and queries name of one of my local devices then checks its IP is given or not (device name and IP is in /etc/hosts so no query goes to NextDNS unnecessarily). If there's no IP then dnsmasq is not working, restart it. If it helps anyone just ask me and I will give the script. |
@Xtreme512 Care to share the script? I'm seeing this on a brand new install of an Edgerouter X with v1.10.11 and NextDNS nextdns-v1.5.8 installed. A restart fixes it each time. |
If you guys can try master and next time it hangs, first send a kill -QUIT to its pid before restart so it dumps a full stack trace in the logs. This trace would be useful to understand what’s going on. |
It's a very simple shell script, works on my OpenWrt without bash or fancy packages.
Crontab to run every 5 minutes .sh file must be UNIX format if edited in Windows environment. |
Happened again today, used kill -QUIT then restarted but where can I grab the logs from? |
If it’s Merlin, in /jffs/syslog.log |
@rs I’m using stock EdgeOS. |
Just hung again for the 3rd time today. Here is what I can see from the log: May 20 14:29:41 ubnt nextdns[23535]: Connected 45.90.28.0:443 (con=10ms tls=13ms, TLS<0>) Each time there is a "Discovered" event then it freezes. Also found this in another log just only once, didn't see this on the other hang: |
You need to run master for the QUIT signal to work |
@rs I'm sorry I don't know what that means. Any documentation on how to do that? |
I means checking out the code and compile it. I will create snapshots so you can test without going thru that. |
@rs Ok thank you. Let me know when they are ready and I can test ASAP. I’m getting about 3-5 freezes a day so it shouldn’t take long to get a log file. |
You will find binaries here: https://drive.google.com/drive/folders/1-uurvV67jBtBOH6Y_SQv2-W8O6e4fHI3 |
Thanks for the files. Running into a problem installing. ubnt@ubnt:~$ sh -c 'sh -c "$(curl -sL https://nextdns.io/install)"' ubnt@ubnt:~$ sudo dpkg -i nextdns_v1.5.8-SNAPSHOT-394b795_linux_mipsle_softfloat.deb |
Try the tarball instead of deb |
I think I reproduce the issue at home. I tracked it down to a nasty bug in the Go http2 library: golang/go#23559. |
Please try master again. Snapshot: https://drive.google.com/drive/folders/1W73Er37Do9Lg50rMQ0yunBEvBWexG6YW |
New build up and running, I'll keep an eye on it over the weekend. |
Happened again this morning. ubnt@ubnt:~ sudo nextdns version May 22 08:05:10 ubnt nextdns[6784]: Connected 108.61.155.162:443 (con=14ms tls=137ms, TLS13) |
The version running is not the version installed apparently. The |
You were right, it‘s installed in /use/bin/nextdns but the old version must of been in memory. Uninstalled, rebooted and installed again. Now it’s showing the snapshot version in the logs. |
fingercrossed |
I've been running into what appears to be the same issue ever since I started using
Also, it appears to log the time taken for domains that are not using @rs Would there be a new release with this bug fix soon? For now, I have installed the snapshot build from the link that was shared earlier.
Will report back if I do run into the same issue again. |
The new release will be released soon. I wanted to validate it fixed the issue before. Please report if it fixed the issue for you. |
Fix is looking good. No hangs after 4 days. |
Fix is working great for me as well, no more DNS outages like before 😅 This could be unrelated to this bug, I have my OpenWRT router setup with multi-WAN failover, and at times when there is a failover, there is a short DNS outage even after the failover to backup WAN is complete (the failover thresholds usually trigger a failover in less than 45s or so). I would be able to ping an external IP address, but DNS resolution if not already cached just times out (due to #230, its a bit obvious in my case as I use Google DNS for the domain google.com) It will recover on its own after 1-2 minutes when it tries to reconnect and uses the backup WAN to go out. Perhaps it should be a bit more aggressive in trying to detect connectivity failures and re-establish connectivity? |
Yes, we'll work on that. |
@rs unfortunately I experienced "the hang" twice today. I am running version 1.6.3(latest) on my OpenWRT router. Restarting the service does remedy the issue, but its no resolution. LogsTimestamp is in UTC.
|
Please send a kill -QUIT to the deamon pid when this happens. It will print a stack trace in the logs. That would help me understand what’s going on. |
@rs here are some logs LogsTimestamp is in UTC.
Logs 2Timestamp is in UTC.
|
I may have been suffering from #238 |
Sadly enough I am still suffering from this issue. It has become increasingly unreliable because for me it does not have once a day but once every hour. Sadly enough I cannot post any new stacktraces as these are being truncated. Already pointed this out in #238 (comment) It would be nice if this issue is reopened as its not really resolved. |
When it hangs, what is the behavior of a dig? |
Its a pretty straight forward timeout.
|
What do you get for |
root@OpenWrt:~# nextdns log | grep Start |
Please contact us on the support chat so we can debug together. |
For the record, I'm not experiencing any hanging after started using the script anymore. |
Uh oh!
There was an error while loading. Please reload this page.
The daemon seems to randomly hang about once a day. The status command shows it as running and the logs show no errors before it fails. Restarting fixes it, but the daemon goes down causing a full internet outage each time. Is there a way to gather additional logs from the daemon?
Context
The text was updated successfully, but these errors were encountered: