Skip to content

increased concurrency for DMSDK QueryBatcher Java-195 #1269

Closed
@llinggit

Description

@llinggit

Copied from https://project.marklogic.com/jira/browse/JAVA-195

BACKGROUND: QueryBatcher collects document URIs in a series of independent requests for each forest. That way, QueryBatcher can page efficiently by using the last document URI from the previous batch as the start value for the next batch of document URIs.

CURRENT APPROACH: QueryBatcher processes a batch of document (whether exporting the documents or transforming the documents in place) in the same thread that collects the document URIs. When configured with 2 threads per forest, QueryBatcher starts collecting the next set of document URIs concurrently with processing the current document URIs. The same batch size is used for collecting document URIs and processing documents. Typically, the cost of processing documents dominates the overall cost of the job.

REVISED APPROACH: Instead of having to compromise on a single batch size that might not be optimal for either kind of request, QueryBatcher can use a larger batch size for collecting the document URIs and a smaller batch size for processing the documents. As a result, multiple batches of documents from the same forest can themselves be processed concurrently.

So we can address your issue, please include the following:

Version of MarkLogic Java Client API

See Readme.txt

Version of MarkLogic Server

See admin gui on port 8001 or run xdmp:version() in Query Console - port 8000)

Java version

Run java -version

OS and version

For MAC, run sw_vers.
For Windows, run systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
For Linux, run cat /etc/os-release and uname -r

Input: Some code to illustrate the problem, preferably in a state that can be independently reproduced on our end

Actual output: What did you observe? What errors did you see? Can you attach the logs? (Java logs, MarkLogic logs)

Expected output: What specifically did you expect to happen?

Alternatives: What else have you tried, actual/expected?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions