Skip to content

Add SimpleTable construct based on Named Tuples #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 28, 2025

Conversation

bishabosha
Copy link
Contributor

@bishabosha bishabosha commented May 4, 2025

Abstract

In this PR, introduce a new com.lihaoyi::scalasql-namedtuples module, supporting only Scala 3.7+, with three main contributions:

  • scalasql.namedtuples.SimpleTable class as an alternative to Table. The pitch is that you can use a "basic" case class with no higher-kinded type parameters to represent your models.
  • scalasql.namedtuples.NamedTupleQueryable - provides implicit Queryable so you can return a named tuple from a query.
  • scalasql.simple package object, that re-exports scalasql package, plus SimpleTable and NamedTupleQueryable

Example

Defining a table

import scalasql.simple.{*, given}

case class City(
    id: Int,
    name: String,
    countryCode: String,
    district: String,
    population: Long
)

object City extends SimpleTable[City]

return named tuples from queries

val query = City.select.map(c => (name = c.name, population = c.population))
val res: Seq[(name: String, population: Long)] = db.run(query)

Design

This PR manages to introduce SimpleTable entirely without needing to change the core library. It is designed around leveraging the new Named Tuples and programmatic structural typing facilities introduced in Scala 3.7. No macros are needed.

It was also designed such that any query should be pretty much source compatible if you change from Table to SimpleTable - (with the exception of dropping [Sc] type arguments).

Within a query, e.g. City.select.map(c => ...) we still need c to be an object that has all the fields of City, but they need to be wrapped by either scalasql.Expr[T] or scalasql.Column[T].

With Table this is done by the case class having an explicit type parameter (e.g. City[T[_]](name: T[String]...)) - so you would just substitute the parameter. Of course with SimpleTable the main idea is that you do not declare this T[_] type parameter, but the scalasql.query package expects it to be there.

The solution in this PR is to represent the table row within queries by Record[City, Expr], (rather than City[Expr]).
Record[C, T[_] is a new class, and essentially a structurally typed tuple that extends scala.Selectable with a named tuple Fields type member, derived mapping T over NamedTuple.From[C].

Record (and SimpleTable) still support using a nested case class field to share common columns (with a caveat*).

When you return a Record[C, T] from a query, you need to still get back a C, so SimpleTable provides an implicit Queryable.Row[Record[C, Expr], C], which is generated by compiletime derivation (via inline methods).

Implementation

To make a simpler diff, SimpleTable is entirely defined in terms of Table. i.e. here is the signature:

class SimpleTable[C](
    using name: sourcecode.Name,
    metadata0: Table.Metadata[[T[_]] =>> SimpleTable.MapOver[C, T]]
) extends Table[[T[_]] =>> SimpleTable.MapOver[C, T]](using name, metadata0)

The metadata0 argument is expected to be generated automatically from an inline given in SimpleTableMacros.scala (I suggest to rename to SimpleTableDerivation.scala)

Table[V[_[_]], being a higher kinded type, normally expects some case class Foo[T[_]], and fills in various places V[Expr] or V[Column] in queries, and V[Sc] for results. However for SimpleTable when T[_] is scalasql.Sc we want to return C and otherwise return this Record[C, T] so MapOver needs to be a match type:

object SimpleTable {
  type MapOver[C, T[_]] = T[Internal.Tombstone.type] match {
    case Internal.Tombstone.type => C // T is `Sc`
    case _ => Record[C, T]
  }
}

(Tombstone is used here to try and introduce a unique type that would never be used for any other purpose, i.e. be disjoint in the eyes of the match type resolver - also so we can convince ourselves that if T returns Tombstone it is probably the identity and not some accident.)

See #83 for another approach that eliminates removes the V[_[_]] from Table, Insert and various other places.

Design of Record

Record[C, T[_]] is implemented as a structural type that tries to wrap the fields of C in T. It has a few design constraints:

  • When C has a field of type X that is a nested Table, the corresponding field in Record[C, T] must also be Record[X, T].
  • when selecting a nested Record, preserve which type (e.g. Expr or Column) to wrap fields in from the outer level.
  • simple types in the IDE

First decision: Record uses a Fields type member for structural selection, rather than the traditional type refinements.
Why:

  • can be constructed without macros
  • internals can be based on Array rather than a hash map,
  • Fields derived via NamedTuple.From[C] is treated as part of the class implementation, this means you never get a huge refinement type showing up whenever you hover in the IDE.

Second decision: how to decide which fields are "scalar" data and which are nested records.
Constraints:

  • previously with Table, the only evidence that a field of type X is a nested table is implicit evidence of type Table.ImplicitMetadata[X]
  • match types can only dispatch on statically known information, and there is not currently any match type (or scala.compiletime intrinsic) that can tell you if there exists an implicit of type X.

Choices:

  • pre-compute the transitive closure of all possible nested fields as a third type argument to record, which in typical cases would be empty
  • require that the field have some marker, e.g. foo: Ref[Foo], unclear how much this would be intrusive at each use-site
  • introduce a marker class (SimpleTable.Nested) that the nested case class should extend - this does however prevent using "third party" classes as a nested table

The implicit derivation of Metadata also enforces that whenever an implicit metadata is discovered for use as field, the class must extend SimpleTable.Nested.

object SimpleTable {

  // needs to be a class so the match type reducer can "prove disjoint" to various other types. 
  abstract class Nested

  final class Record[C, T[_]](private val data: IArray[AnyRef]) extends Selectable:
    /**
     * For each field `x: X` of class `C` there exists a field `x` in this record of type
     * `Record[X, T]` if `X` is a case class that represents a table, or `T[X]` otherwise.
     */
    type Fields = NamedTuple.Map[
      NamedTuple.From[C],
      [X] =>> X match {
        case SimpleTable.Nested => Record[X, T]
        case _ => T[X]
      }
    ]
  }
}

Alternatives

Why is there Record[C, T] and not ExprRecord[C] or ColumnRecord[C] classes?

This was explored in #83, which requires a large change to the scalasql.query package, i.e. a new type hierarchy for Table (but makes more explicit in types the boundary between read-only queries, column updates, and results). It's also unclear if it relies upon "hacks" to work.

Why use Record[C, T] and not named tuples in queries?

  1. its almost impossible and (expensive when possible) to preserve the mapping that a large named tuple type (with no reference to the original class) should map back to the class after running the query
  2. Also would be ambiguous with when you explicitly want to return a named tuple, rather than map back to the table class.
  3. Record is a very cheap association directly back to the class it derives from, it also is a compact type if ever needed to be written explicitly, or shown by an IDE.

What is needed to get rid of Simpletable.Nested?

lets remind ourselves of the current definition of SimpleTable:

class SimpleTable[C](
    using name: sourcecode.Name,
    metadata0: Table.Metadata[[T[_]] =>> SimpleTable.MapOver[C, T]]
) extends Table[[T[_]] =>> SimpleTable.MapOver[C, T]](using name, metadata0) {
  given simpleTableGivenMetadata: SimpleTable.GivenMetadata[C] =
    SimpleTable.GivenMetadata(metadata0)
}

First thing - we determined that the transitive closure of available implicit SimpleTable.GivenMetadata[Foo] needs to be added as an argument to Record.

In #82 we explored this by just precomputing all the field types ahead of time in a macro, so the types would look a bit like Record[City, Expr, (id: Expr[Long], name: Expr[String], nested: (fooId: Expr[Long], ...))] which was very verbose.

An alternative could be to pass as a type parameter the classes which have a metadata defined. Something like Record[City, Expr, Foo | Bar] or Record[Foo, Expr, Empty.type], and modify the Record class as such:

-final class Record[C, T[_]](private val data: IArray[AnyRef]) extends Selectable:
+final class Record[C, T[_], <TABLES>](private val data: IArray[AnyRef]) extends Selectable:
    /**
     * For each field `x: X` of class `C` there exists a field `x` in this record of type
     * `Record[X, T]` if `X` is a case class that represents a table, or `T[X]` otherwise.
     */
    type Fields = NamedTuple.Map[
      NamedTuple.From[C],
-     [X] =>> X match {
-       case Nested => Record[X, T]
+     [X] =>> IsSub[X, <TABLES>] match {
+       case true => Record[X, T]
        case _ => T[X]
      }
    ]
  }

This could be a sweet spot between verbosity and extensibility to "uncontrolled" third party classes - but it is uncertain who in reality would be blocked by needing to extend SimpleTable.Nested. Also it is still to determine the potential impact on performance of compilation times, also the best place to compute this type without causing explosions of implicit searches.
You can see a prototype here: bishabosha/scalasql#table-named-tuples-infer-nested-tables

Build changes

introduce top level scalasql-namedtuples module

  • publishes as com.lihaoyi:scalasql-namedtuples_3
  • scalaVersion 3.7.0
  • sources are located in scalasql/namedtuples
  • depends on module scalasql("3.6.2") - so that it can re-export all of scalasql from the scalasql.simple package object

Also declare scalasql-namedtuples.test module

  • sources in scalasql/namedtuples/test
  • depends on module scalasql("3.6.2").test, so the custom test framework can be used to capture test results.

Testing changes

The main approach to testing was to copy test sources that already exist, and convert them to use SimpleTable with otherwise no other changes.

Assumptions made when copying:

  • the majority of existing scalasql tests are testing the query translation to SQL, rather than specifically the implementation of Table.Metadata generated by macros.
  • Since the only difference between using SimpleTable and Table are the signatures of implicits available, and the implementation of Table.Metadata, the test coverage for SimpleTable should focus on type checking, and that the fundamentals of TableMetadata are implemented correctly in a "round trip".
  • so I copied the tests from scalasql/test/src/ExampleTests.scala, scalasql/test/src/datatypes/DataTypesTests.scala and scalasql/test/src/datatypes/OptionalTests.scala, renaming the traits and switching from Table to SimpleTable, otherwise unchanged.
  • I also had to copy scalasql/test/src/ConcreteTestSuites.scala to scalasql/namedtuples/test/src/SimpleTableConcreteTestSuites.scala, commenting out most objects except OptionalTests and DataTypesTests, which now extend the duplicated and renamed suites. I also renamed the package to scalasql.namedtuples
  • finally I also copied scalasql/test/src/WorldSqlTests.scala (to scalasql/namedtuples/test/src/example/WorldSqlTestsNamedTuple.scala) to ensure that every example in tutorial.md compiles after switching to SimpleTable, and also to provide snippets I will include in the tutorial.md.
  • I also renamed a few tests in the duplicates of OptionalTests.scala and DataTypesTests.scala so that they would generate unique names that can be included in reference.md.

New tests:

  • demonstrations of returning Named Tuples from the various SimpleTableH2Example tests
  • scalasql/namedtuples/test/src/datatypes/LargeObjectTest.scala to stress test the compiler for large sized classes.
  • scalasql/namedtuples/test/src/example/foo.scala for quick testing of compilation, typechecking etc.
  • replacement of case class copy method with Record#updates in SimpleTableOptionalTests.scala

Documentation changes

tutorial.md and reference.md are generated from scala source files and test results in docs/generateDocs.mill.

I decided that rather than duplicate both tutorial.md and reference.md for SimpleTable, it would be better to avoid duplication, or potential drift, by reusing the original documents, but include specific notes when use of SimpleTable or NamedTupleQueryable adds new functionality or requires different code.

tutorial.md

To update tutorial.md I wrote the new text as usual in WorldSqlTests.scala. These texts exclusively talk about differences between the two approaches, such as declaring case classes, returning named tuples, or using the updates method on record. To support the new texts, I needed to include code snippets. But like in WorldSqlTests.scala I would prefer the snippets to be verified in a test suite. So the plan was to copy WorldSqlTests.scala to a new file, update the examples to use SimpleTable and include snippets from there.

To support including snippets from another file I updated the generateTutorial task in docs/generateDocs.mill. The change was that if the scanner sees a line // +INCLUDE SNIPPET [FOO] somefile in WorldSqlTests.scala, then it switches to reading the lines from somefile file, looking for the first line containing // +SNIPPET [FOO], then splices all lines of some file until it reaches a line containing // -SNIPPET [FOO], then it switches back to reading the lines in WorldSqlTests.scala.

The main idea is that snippets within somefile should be declared in the same order that they are included from WorldSqlTests.scala, meaning that the scanner traverses both files from top to bottom once (beginning from the previous position whenever switching back).

So to declare the snippets as mention above I copied WorldSqlTests.scala to scalasql/namedtuples/test/src/example/WorldSqlTestsNamedTuple.scala, replaced Table by SimpleTable and declared in there the snippets I wanted (and included them from WorldSqlTests.scala) .

Any other changes (e.g. newlines, indentation etc) are likely due to updating scalafmt.

reference.md

this file is generated by the generateReference task in docs/generateDocs.mill. It works by formatting the data from out/recordedTests.json (captured by running tests with a custom framework) and grouping tests by the suite they occur in.

Like with tutorial.md I thought it best to only add extra snippets that highlight the differences between the two kinds of table.

So first thing to capture the output of simple table tests, in the build I set the SCALASQL_RECORDED_TESTS_NAME and SCALASQL_RECORDED_SUITE_DESCRIPTIONS_NAME environment variables in the scalasql-namedtuples.test module: in this case recordedTestsNT.json and out/recordedSuiteDescriptionsNT.json.

Next I updated the generateReference task so that it also includes the recorded outputs from recordedTestsNT.json. This task handles grouping of tests and removing duplicates (e.g. the mysql, h2 variants). I made it so that for each Suite e.g. DataTypes it find the equivalent suite in the simple table results, and then only include the test names it hadn't seen at the end of that suite.

So therefore to include any test result from SimpleTableDataTypesTests.scala or SimpleTableOptionalTests.scala, it is only necessary to rename an individual test, and it will be appended to the bottom of the relevant group in reference.md. For this PR I did this by adding a - with SimpleTable suffix to relevant tests (i.e. the demonstration of nested classes, and the usage of Record#updates method)

@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from 1501969 to 496dc23 Compare May 5, 2025 16:30
@bishabosha
Copy link
Contributor Author

@lihaoyi pushed a queryable for arbitrary named tuples, and now only nested case class need to extend SimpleTable.Source

@bishabosha
Copy link
Contributor Author

bishabosha commented May 18, 2025

I've been experimenting with augmenting class Record[C, T[_]] with two subclasses to "improve" inferred types, namely convert Record to trait Record[C, +T[_]], and introduce ExprRecord[C] extends Record[C, Expr] and ColRecord extends ExprRecord[C] with Record[C, Column]. because in the end the actual data would always be a ColRecord, just sometimes with a "view" as an ExprRecord (e.g. for V[Expr])

However this is not possible as covariant T can not be referenced from the Fields type of Record, and otherwise if T is invariant (or a member type) then it is illegal to override. And ColRecord must extend ExprRecord or else there should be class cast exception somewhere.

so I think it will have to remain as is

@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from 944299f to 006172d Compare May 18, 2025 16:43
@bishabosha
Copy link
Contributor Author

latest commit renames Lift to MapOver, and removes some duplication

@bishabosha
Copy link
Contributor Author

moved the named tuple queryable stuff to a dedicated file.

the queryables function should only cache factories that still require the mappers to be provided.
@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from e0f83fe to 339eb4b Compare May 20, 2025 20:37
@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from 339eb4b to 6cf41b2 Compare May 20, 2025 21:10
@bishabosha
Copy link
Contributor Author

now i put everything available in one import import scalasql.simple.{*, given} which exports all of scalasql.package, NamedTupleQueryable.given and SimpleTable

@bishabosha bishabosha force-pushed the feature/table-named-tuples branch 4 times, most recently from aca6099 to e1e5cf6 Compare May 23, 2025 23:19
@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from e1e5cf6 to 0de7f5c Compare May 23, 2025 23:39
@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from 4600d21 to 40c3237 Compare May 24, 2025 11:20
@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from a7997c1 to 88ede3a Compare May 24, 2025 12:27
@bishabosha bishabosha force-pushed the feature/table-named-tuples branch from 410c2c6 to f152114 Compare May 24, 2025 12:44
@bishabosha
Copy link
Contributor Author

@lihaoyi I have updated the PR description with an explanation of choices made

@lihaoyi
Copy link
Member

lihaoyi commented May 24, 2025

Thanks, will take a look

@bishabosha
Copy link
Contributor Author

@lihaoyi I have updated the PR description with changes to the build, documentation, and testing

@lihaoyi lihaoyi merged commit eb6bf3a into com-lihaoyi:main May 28, 2025
6 checks passed
@bishabosha bishabosha deleted the feature/table-named-tuples branch May 28, 2025 06:57
@bishabosha
Copy link
Contributor Author

here is a benchmark to compare against Table it seems a 1.5x increase in compilation time - https://gist.github.com/bishabosha/fd52492bb8977fa89e8dc28239ee48a6

good to know anyway, perhaps there are tweaks that can be made before anyone immediately jumps to macros

@lihaoyi
Copy link
Member

lihaoyi commented May 31, 2025

Not surprising there's some performance penalty, but it's probably fine. ScalaSql isn't meant to be hyper-optimized and generally the bottleneck for database queries is on the actual database anyway, rather than in application code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants