Skip to content

Better support for COPY operations #213

@drjokepu

Description

@drjokepu

The approach discussed and implemented in #181 works well and it is very useful, however, it has a number of problems:

  1. It doesn’t (and maybe can’t) support COPY FROM operations.
  2. It doesn’t support the binary format.
  3. It doesn't support piping in data directly from an external source, data has to be provided row by row, cell by cell.
  4. It reuses the existing Go SQL interface in non-trivial, confusing ways. In particular, it makes Prepare() and Exec() do things they are not meant (and documented) to do.
  5. It decides whether a statement is a COPY statement or not by looking at the first four characters of the statement text and comparing it to the "COPY" word in a non case-sensitive manner. This will work most of the time but it’s not too difficult to imagine scenarios where this will fail to identify a copy operation (or maybe even identify a non-copy operation as one). The right way to do this is by parsing the query text into an AST and the right place to do that is in the backend. The front-end is simply not the right place to make decisions about the nature of queries.

In my opinion, the first three are very real shortcomings that need to be looked at, number 4 is an inconvenience the very least and number 5 is the least important as it’s only an edge case.

To address these shortcomings, I propose the following approach (which is basically an adaptation of Npgsql’s approach, something I have been using for years with much success):

  • Two structs: CopyIn, CopyOut (or CopyInWriter and CopyInReader), each conforming to io.Writer and io.Reader respectively.
  • Both structs maintaining a reference to the Stmt of the copy operation.
  • The operation is started with copyIn.Start() or copyOut.Start()
  • Then data is written to / read from these structs via the io.Writer and io.Reader interfaces or via the aid of a number of higher-level helper functions
  • copyIn.End() / copyOut.End() will complete the operation.

This is a somewhat different approach than what is currently implemented but I’m convinced that it will make the support for copy operations more complete and the application code more maintainable. The downside is that it cannot be made compatible with the current approach and that it would require the application code to work with the driver directly (although that is something that’s already being done when using the existing CopyIn() and CopyInSchema() functions). A possible way to mitigate the downsides is to keep around the old approach, temporarily or permanently.

I’m keen to get this thing working (as I need a solution to problems 1 and 2) so unless there is serious opposition to this, I will start working on this immediately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions