Skip to content

[Bug]: unrecoverable The collection's state is no longer correct, some Entities return errors until DAB is restarted #2694

@golfalot

Description

@golfalot

What happened?

Background

  • We are running the official image mcr.microsoft.com/azure-databases/data-api-builder:1.4.27
  • This issue is not new to this release, it's long standing across multiple releases of DAB and .Net 6 and .Net 8
  • We are querying a single CosmosDB database , DAB configuration Entities map to around 20 Cosmos containers.
  • All the queries are point reads by primary key
  • The CosmosDB is not being altered during the period the error occurs

Error message
"Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct."

Symptoms

  1. The request to /graphql returns HTTP 200, but only some Entities are resolved in the data property, with errors field populated
  2. Once the error has occurred once, those Entities that failed with consistently fail until the container is restarted/redeployed
  3. Curiously, entities in the error state includes some which have not been queried.
"errors": [
        {
            "message": "Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.",
            "locations": [
                {
                    "line": 46,
                    "column": 3
                }
            ],
            "path": [
                "copernicusSlope_by_pk"
            ]
        },
        {
            "message": "Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.",
            "locations": [
                {
                    "line": 102,
                    "column": 3
                }
            ],
            "path": [
                "hadUKgroundfrost_by_pk"
            ]
        },
...

Repeatability

  • We cannot reproduce at will, hundreds of thousand of requests are successfully handled over a period of a week or two
  • It appears to coincide with peak concurrent requests, in the region of 200-300 requests per minute

Desired behaviour

  • The odd error here and there is acceptable, but getting stuck in a persistent errored state, whilst still returning HTTP 200 response codes is difficult to manage operationally
  • Possibly give some consideration to a /health api that can report on this persistent error state

Version

1.4.27.0

What database are you using?

CosmosDB NoSQL

What hosting model are you using?

Container Apps

Which API approach are you accessing DAB through?

GraphQL

Relevant log output

[{
        "severityLevel": "Error",
        "outerId": "0",
        "message": "Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.",
        "type": "System.InvalidOperationException",
        "id": "60417033",
        "parsedStack": [{
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported",
                "level": 0,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Collections.Generic.Dictionary`2.FindValue",
                "level": 1,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Core.Resolvers.CosmosQueryEngine+<GetPartitionKeyPath>d__16.MoveNext",
                "level": 2,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 3,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 4,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 5,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Core.Resolvers.CosmosQueryEngine+<GetIdAndPartitionKey>d__17.MoveNext",
                "level": 6,
                "line": 344,
                "fileName": "/_/src/Core/Resolvers/CosmosQueryEngine.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 7,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 8,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 9,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Core.Resolvers.CosmosQueryEngine+<ExecuteAsync>d__8.MoveNext",
                "level": 10,
                "line": 81,
                "fileName": "/_/src/Core/Resolvers/CosmosQueryEngine.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 11,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 12,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 13,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Service.Services.ExecutionHelper+<ExecuteQueryAsync>d__5.MoveNext",
                "level": 14,
                "line": 79,
                "fileName": "/_/src/Core/Services/ExecutionHelper.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 15,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 16,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 17,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "ResolverTypeInterceptor+<>c__DisplayClass5_1+<<-ctor>b__5>d.MoveNext",
                "level": 18,
                "line": 23,
                "fileName": "/_/src/Core/Services/ResolverTypeInterceptor.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 19,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 20,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 21,
                "line": 0
            }, {
                "assembly": "HotChocolate.Execution, Version=12.22.6.0, Culture=neutral, PublicKeyToken=null",
                "method": "HotChocolate.Execution.Processing.Tasks.ResolverTask+<ExecuteResolverPipelineAsync>d__58.MoveNext",
                "level": 22,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 23,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 24,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 25,
                "line": 0
            }, {
                "assembly": "HotChocolate.Execution, Version=12.22.6.0, Culture=neutral, PublicKeyToken=null",
                "method": "HotChocolate.Execution.Processing.Tasks.ResolverTask+<TryExecuteAsync>d__57.MoveNext",
                "level": 26,
                "line": 0
            }
        ]
    }
]

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageissues to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions