Skip to content

Stop file-based logging appenders on checkpoint and restart them on restore #38548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nobodyiam opened this issue Nov 25, 2023 · 3 comments
Closed
Labels
status: declined A suggestion or change that we don't feel we should currently apply theme: checkpoint-restore Issues related to coordinated restore at checkpoint type: enhancement A general enhancement

Comments

@nobodyiam
Copy link

  • spring-boot version: 3.2.0
  • spring version: 6.1.1
  • azul version: zulu17.46.29-ca-crac-jdk17.0.9-linux_x64

I am currently working on integrating CRaC support with Spring Boot 3.2.0, as detailed in this GitHub pull request.
During the checkpoint phase, I encountered an issue. It appears that the application's logging file, /opt/logs/qa-bot.log (which you can find defined here), may not be closing properly.
The application utilizes Logback for logging. Could this be a configuration oversight on my part, or is Logback possibly not yet compatible?

2023-11-25T14:51:58.172+08:00  WARN 2341104 --- [main] onfigReactiveWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.context.ApplicationContextException: Failed to take CRaC checkpoint on refresh
2023-11-25T14:51:58.272+08:00  INFO 2341104 --- [main] .s.b.a.l.ConditionEvaluationReportLogger :

Error starting ApplicationContext. To display the condition evaluation report re-run your application with 'debug' enabled.
2023-11-25T14:51:58.326+08:00 ERROR 2341104 --- [main] o.s.boot.SpringApplication               : Application run failed

org.springframework.context.ApplicationContextException: Failed to take CRaC checkpoint on refresh
        at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:534)
        at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:193)
        at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:965)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:619)
        at org.springframework.boot.web.reactive.context.ReactiveWebServerApplicationContext.refresh(ReactiveWebServerApplicationContext.java:66)
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:753)
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:455)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:323)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1342)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1331)
        at com.apolloconfig.apollo.ai.qabot.QABotApplication.main(QABotApplication.java:14)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:91)
        at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:53)
        at org.springframework.boot.loader.launch.JarLauncher.main(JarLauncher.java:58)
Caused by: org.crac.CheckpointException: null
        at org.crac.Core$Compat.checkpointRestore(Core.java:144)
        at org.crac.Core.checkpointRestore(Core.java:237)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:528)
        ... 17 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: /opt/logs/qa-bot.log
                at java.base/jdk.internal.crac.JDKFileResource.lambda$beforeCheckpoint$1(JDKFileResource.java:108)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:169)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=10 type=unknown path=anon_inode:[eventpoll]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=11 type=unknown path=anon_inode:[eventfd]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=12 type=unknown path=anon_inode:[timerfd]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenSocketException: FD fd=13 type=socket path=socket:[91583428]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:115)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 25, 2023
@snicoll

This comment was marked as outdated.

@snicoll snicoll closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2023
@snicoll snicoll added status: duplicate A duplicate of another issue and removed status: waiting-for-triage An issue we've not yet triaged labels Nov 25, 2023
@sdeleuze
Copy link
Contributor

Per my comment in spring-projects/spring-framework#31680 (comment), I propose to reopen this issue since it looks possible to support that on Boot side as demonstrated by my experimental commit. I would suggest a Spring Lifecycle based implementation, and of course we need to check we can reliably reproduce what I tried in this experiment. Please let me know if you need my contribution.

@wilkinsona wilkinsona reopened this Nov 29, 2023
@wilkinsona wilkinsona changed the title CheckpointOpenFileException occurred with spring boot log files Stop file-based logging appenders on checkpoint and restart them on restore Nov 29, 2023
@wilkinsona wilkinsona added type: enhancement A general enhancement theme: checkpoint-restore Issues related to coordinated restore at checkpoint and removed status: duplicate A duplicate of another issue labels Nov 29, 2023
@wilkinsona wilkinsona added this to the 3.x milestone Nov 29, 2023
@wilkinsona
Copy link
Member

I don't think we should do this. The logging framework should take care of stopping and starting file-based appenders upon checkpoint and restore.

@wilkinsona wilkinsona closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2025
@wilkinsona wilkinsona added the status: declined A suggestion or change that we don't feel we should currently apply label Apr 25, 2025
@wilkinsona wilkinsona removed this from the 3.x milestone Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply theme: checkpoint-restore Issues related to coordinated restore at checkpoint type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

5 participants