Skip to content

[m68k] Add support for thread local storage #60354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0x59616e opened this issue Jan 28, 2023 · 15 comments
Closed

[m68k] Add support for thread local storage #60354

0x59616e opened this issue Jan 28, 2023 · 15 comments
Assignees

Comments

@0x59616e
Copy link
Contributor

0x59616e commented Jan 28, 2023

This is a place for discussion regarding the support of TLS on m68k.

This issue blocks #59161.

cc @glaubitz

Reference for TLS.

  • this draft gives a detailed description of the TLS ABI of m68k.
  • this paper introduces the ELF handling for TLS.

A terse description of TLS

There are four access model for TLS

  1. general dynamic
  2. local dynamic
  3. inital exec
  4. local exec

The first one is the most generic one and is able to handle all possible cases. 2, 3 and 4 are optimization under certain circumstances.

Current status of gcc

It is weird that gcc seems to only support the two more-restricted access model: inital exec and local exec.

The other two, general dynamic and local dynamic, are unsupported, which may causes problem with dynamically loaded object.

You can attest this claim with the following code:

$ echo '
extern __thread int myvar;


int *get_addr(void) {
  return &myvar;
}
' > test.c
$ m68k-linux-gnu-gcc -ftls-model=global-dynamic -S -o - test.c
#NO_APP
        .file   "test.c"
        .text
        .globl  __m68k_read_tp
        .align  2
        .globl  get_addr
        .type   get_addr, @function
get_addr:
        link.w %fp,#0
        move.l %a5,-(%sp)
        lea (%pc, _GLOBAL_OFFSET_TABLE_@GOTPC), %a5
        jsr __m68k_read_tp
        move.l %a0,%d0
        add.l myvar@TLSIE(%a5),%d0
        move.l %d0,%d1
        move.l %d1,%a0
        move.l -4(%fp),%a5
        unlk %fp
        rts
        .size   get_addr, .-get_addr
        .ident  "GCC: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0"
        .section        .note.GNU-stack,"",@progbits

It still uses the initial-exec model instead of general-dynamic model specified in the command line.

I have no idea why gcc fails to support the general-dynamic model. Or maybe my understanding is incorrect ?

What should we do now ?

We need to know the status of the runtime (glibc and linux) so that we can decided to what extent we need to support.

@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2023

@llvm/issue-subscribers-backend-m68k

@0x59616e
Copy link
Contributor Author

0x59616e commented Jan 28, 2023

It is hilarious that I can always break though the bottleneck upon submitting issue. And this time is no exception.

The reason gcc does not use general-dynamic model is due to the absense of this parameter : -fPIC

$ m68k-linux-gnu-gcc -fPIC -ftls-model=global-dynamic -S -o - ./test.c
#NO_APP
	.file	"test.c"
	.text
	.globl	__tls_get_addr
	.align	2
	.globl	get_addr
	.type	get_addr, @function
get_addr:
	link.w %fp,#0
	move.l %a5,-(%sp)
	lea (%pc, _GLOBAL_OFFSET_TABLE_@GOTPC), %a5
	move.l %a5,%d0
	add.l #myvar@TLSGD,%d0
	move.l %d0,-(%sp)
	bsr.l __tls_get_addr@PLTPC
	addq.l #4,%sp
	move.l %a0,%d0
	move.l %d0,%d1
	move.l %d1,%a0
	move.l -4(%fp),%a5
	unlk %fp
	rts
	.size	get_addr, .-get_addr
	.ident	"GCC: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0"
	.section	.note.GNU-stack,"",@progbits

Although I still don't know why -fPIC is necessary.

I'm still eager to know what should we take care when implementing TLS support. Any comment is precisous and will be appreciated.

@eschaton
Copy link

If there was ever a TLS ABI defined for SVR4 on 68K, then that’s what should probably be used initially.

@glaubitz
Copy link
Contributor

glaubitz commented Jan 29, 2023

The following text describes the m68k TLS and NPTL ABI as it was designed by CodeSourcery.

Source: https://lists.debian.org/debian-68k/2007/11/msg00071.html

CodeSourcery has been investigating implementing TLS (Thread-Local
Storage) and NPTL (Native POSIX Thread Library) for ColdFire
processors.  The proposed TLS ABI for ColdFire and m68k, including the
required kernel interfaces, is below; any comments?

We do not at present have a timescale for the implementation to be
available.  Toolchain patches will probably be contributed to the
respective development mainlines in the usual order (first binutils,
then GCC, then glibc).

ColdFire and m68k TLS and NPTL ABI draft version 0.2
====================================================

For background reading on TLS, see Ulrich Drepper's document
<http://people.redhat.com/drepper/tls.pdf>.

Design choices
--------------

* There are no spare registers available to designate as the thread
  register.  Therefore, kernel magic is needed to obtain the thread
  pointer from userspace.  Kernel helpers are provided in a vDSO since
  they will need unwind information associated; see details below.
  Compiler-generated code will use an ABI-defined function
  __m68k_read_tp with that function handling the details of calling
  the vDSO.

* Use TLS variant I (TLS_DTV_AT_TP in glibc terms), where the TLS data
  goes after the TCB.

* The thread pointer points to 0x7000 (the value of TLS_TCB_OFFSET in
  glibc) after the start of the TLS data areas, as on Power and MIPS.
  This makes a greater amount of the data accessible with signed
  16-bit offsets from the thread pointer than with an unbiased
  pointer.  (0x7000 is used instead of 0x8000 so that the TCB can also
  be accessed with 16-bit offsets from the thread pointer.)

* The DTP for a module points to 0x8000 (the value of TLS_DTV_OFFSET
  in glibc) after the start of the TLS data for that module, as on
  Power and MIPS.

* There are no linker optimizations to convert one TLS model into
  another; as such, the compiler can rearrange and optimize the
  instruction sequences shown.  The relocations can be applied to
  extension words in many different instructions.

* The __tls_get_addr function is

  typedef struct {
    unsigned long int ti_module;
    unsigned long int ti_offset;
  } tls_index;

  extern void *__tls_get_addr (tls_index *ti);

* All the static relocations for offsets from GOT, DTP or TP are
  defined in 8-bit, 16-bit and 32-bit forms, similarly to existing
  m68k/ColdFire relocations.  Both the 16-bit and 32-bit forms are
  likely to be of use in compiler-generated code.

Kernel helpers
--------------

This TLS ABI defines a function __m68k_read_tp, provided by libc.
This returns the thread pointer in register a0 (not d0) and may
clobber other call-clobbered registers.  The compiler will generate
calls to this function for the initial exec and local exec models.

To implement this function and other requirements for NPTL, four
kernel helpers are to be provided in a vDSO (as provided by the kernel
on Power and other architectures).  The symbols indicated are exported
at symbol version LINUX_2.6.  Full DWARF unwind information for all
these functions must be included in the vDSO, as thread cancellation
may need to unwind from any point in any of these functions.  The
kernel informs glibc of the location of the vDSO by putting an
AT_SYSINFO_EHDR entry in the auxiliary vector passed to each process.
If glibc is configured for a subset of processors where the necessary
operations do not require a kernel helper, then it does not need to
use the kernel helper (for example, glibc configured only for m68k
processors with a cas instruction does not need to use the
compare-and-exchange helper), but the kernel must provide all these
helpers on all m68k and ColdFire processors so that
lowest-common-denominator glibc binaries can work across all
processors.

The helper __kernel_read_tp returns the thread pointer in register a0
(not d0) and may clobber other call-clobbered registers.  (Because it
is only called from __m68k_read_tp, which is called through the PLT,
and the resolver may clobber call-clobbered registers, there seems to
be no advantage in restricting clobbers from this helper.)

Beyond the helper required for TLS, three further kernel helpers are
proposed for NPTL implementation: one to provide an atomic
compare-and-exchange operation (not available directly in the ColdFire
instruction set), one to provide a memory barrier (which can just
return to the user for non-SMP) and one to set the thread pointer.

The helper __kernel_atomic_cmpxchg_32 compares the 32-bit value at the
location pointed to by a0 with the value in d0.  If the values are
equal, it writes the value in d1 to the location pointed to by a0;
otherwise, it writes the value at the location pointed to by a0 to d0.
It does not clobber any registers other than the condition codes (and
the modification of d0 indicated so that d0 is returned with the
original value of the memory location in all cases).  (On m68k - where
this kernel helper would only be used if glibc is built for the
intersection of ColdFire and m68k - this could be implemented with a
single cas instruction and a return.)

The helper __kernel_atomic_barrier provides a memory barrier.  It does
not clobber any registers other than the condition codes.  On non-SMP,
it can just return to the user; on SMP it needs to ensure memory
synchronization between processors.

The helper __kernel_write_tp sets the thread pointer to the value in
a0.  It does not clobber any registers other than the condition codes.

Offset length issues
--------------------

On ColdFire (and m68k before 68020), only 16-bit offsets can be used
in memory addresses.  On m68k (68020 and later), 32-bit offsets can be
used; a ".w" assembly suffix is used for 16-bit offsets, and otherwise
the offsets are 32 bits.

The use of 16-bit offsets limits GOT size to 8192 entries (the
toolchain does not use negative GOT offsets on m68k/ColdFire).  On
m68k (68020 and later), GCC uses 32-offsets with -fPIC and 16-bit
offsets with -fpic (and does not need to use GOT accesses for non-PIC
code at present).

The proposals here do not address GOT size limitations, although an
example is given to illustrate a possible longer access sequence to
avoid those limitations on ColdFire.  The examples using offsets such
as #x@TLSGD in GOT accesses are shown for ColdFire and use the 16-bit
relocations shown.  For m68k (68020 and later), either the syntax
shown may be used, with a 32-bit relocation, or a ".w" suffix may be
used, with a 16-bit relocation.  It is proposed that the compiler, on
m68k (68020 and later), will use ".w" for -fpic and the 32-bit offsets
otherwise.  (No specific option is proposed to choose between 16-bit
and 32-bit offsets for the non-PIC, initial exec case, though such an
option could be added later.)

The same issue as for GOT accesses also applies to accesses to TLS
data using the local dynamic and local exec models.  The example code
sequences determine the address of the variable, but typically it will
be desired to read or write the variable and this may be done more
efficiently using offset addressing.  It is proposed that by default
the compiler will require the relevant TLS area to be accessible using
16-bit offsets, and that an option -mxtls must be used when compiling
objects that use the local dynamic or local exec models and will be
linked into a module with too large a TLS area for 16-bit offset
addressing.

Conventions
-----------

In the instruction sequences shown below, a5 is used to refer to the
GOT pointer (which must already have been loaded).  Apart from the
ABI-defined registers used for thread-pointer return (a0) and
__tls_get_addr return (d0), other registers may be used where
convenient.

The relocations shown on instructions are to be understood to be
applied to the extension word or words of those instructions.

Code sequences are shown in the form:

instruction                   relocation          against variable

General Dynamic TLS model
-------------------------

Code sequence:

pea #x@TLSGD(%a5)             R_68K_TLS_GD16      x
jbsr __tls_get_addr

Outstanding relocations:

GOT[n]                        R_68K_TLS_DTPMOD32  x
GOT[n+1]                      R_68K_TLS_DTPREL32  x

The R_68K_TLS_GD16 relocation causes the static linker to allocate two
consecutive GOT entries for a tls_index structure and apply the
indicated relocations to them.  The dynamic linker fills in those
entries at runtime.  The code sequence leaves the address of x in d0.

On ColdFire, the example code sequence is limited to a 16-bit GOT
offset, as discussed above.  If a larger GOT is required on ColdFire,
a longer instruction sequence must be used; for example:

move.l %a5,%a0
add.l #x@TLSGD,%a0            R_68K_TLS_GD32      x
pea (%a0)
jbsr __tls_get_addr

Local Dynamic TLS model
-----------------------

Code sequence:

pea #x@TLSLDM(%a5)            R_68K_TLS_LDM16     x
jbsr __tls_get_addr
...
move.l %d0,%a1
add.l #x1@TLSLDO,%a1          R_68K_TLS_LDO32     x1

Outstanding relocations:

GOT[n]                        R_68K_TLS_DTPMOD32  x

The R_68K_TLS_LDM16 relocation causes the static linker to allocate
two consecutive GOT entries for a tls_index structure and apply the
indicated relocation to the first; the second has a value of 0 and no
relocation.  The dynamic linker fills in those entries at runtime.
The first part of the code sequence leaves the address of the TLS
block for the current module (biased by 0x8000 as discussed above) in
%d0.  The second part of the code sequence determines the address of
x1 based on the address of the TLS block; the static linker resolves
R_68K_TLS_LDO32 to the correct offset from the (biased) DTP value.
Other code sequences may be used to access the value of x1 rather than
computing its address, possibly with R_68K_TLS_LDO16 relocations
depending on whether the size of the TLS area for this module is known
to be at most 64k.

Note that the local dynamic model is generally only beneficial if a
function is accessing more than one TLS variable with this model and
so can reuse the TLS block address.

The same comments about GOT size apply as for the general dynamic
model.

Initial Exec TLS model
----------------------

Code sequence:

jbsr __m68k_read_tp
...
move.l #x@TLSIE(%a5),%a1      R_68K_TLS_IE16      x
add.l %a0,%a1

Outstanding relocations (apart from those associated with calling
__m68k_read_tp through the PLT):

GOT[n]                        R_68K_TLS_TPREL32   x

The jbsr instruction loads the thread pointer into a0.  This may be
reused for each variable accessed with this model.  Each
R_68K_TLS_IE16 relocation causes the allocation of a single GOT entry
with the indicated relocation; this GOT entry is set up by the dynamic
linker with the offset for that TLS variable relative to the (biased)
thread pointer.  The second part of the code sequence loads this
offset from the GOT and adds the thread pointer to put the address of
x in a1.

The same comments about GOT size apply as for the general dynamic
and local dynamic models.

Local Exec TLS model
--------------------

Code sequence:

jbsr __m68k_read_tp
...
move.l %a0,%a1
add.l #x@TLSLE,%a1            R_68K_TLS_LE32      x

No outstanding relocations (apart from those associated with calling
__m68k_read_tp through the PLT).

The jbsr instruction loads the thread pointer into a0.  This may be
reused for each variable accessed with this model or the initial exec
model.  The R_68K_TLS_LE32 relocation is resolved by the static linker
to the offset of x relative to the (biased) thread pointer.  The
second part of the code sequence puts the address of x in a1.  Other
code sequences may be used to access the value of x rather than
computing its address, possibly with R_68K_TLS_LE16 relocations
depending on whether all of the TLS area for the executable is known
to be within 32k of the thread pointer.

Debug information
-----------------

DWARF-2 sequence:

DW_OP_addr
.word #x@TLSLDO+0x8000        R_68K_TLS_LDO32     x
DW_OP_GNU_push_tls_address

No outstanding relocations.

The static linker resolves the relocation and offset to put the
unbiased address of x relative to the TLS block for its module in the
word of debug information.  GDB then uses this to locate the variable
at debug time.

ELF relocations
---------------

Static relocations:

#define R_68K_TLS_GD32      25
#define R_68K_TLS_GD16      26
#define R_68K_TLS_GD8       27
#define R_68K_TLS_LDM32     28
#define R_68K_TLS_LDM16     29
#define R_68K_TLS_LDM8      30
#define R_68K_TLS_LDO32     31
#define R_68K_TLS_LDO16     32
#define R_68K_TLS_LDO8      33
#define R_68K_TLS_IE32      34
#define R_68K_TLS_IE16      35
#define R_68K_TLS_IE8       36
#define R_68K_TLS_LE32      37
#define R_68K_TLS_LE16      38
#define R_68K_TLS_LE8       39

Dynamic relocations:

#define R_68K_TLS_DTPMOD32  40
#define R_68K_TLS_DTPREL32  41
#define R_68K_TLS_TPREL32   42

@glaubitz
Copy link
Contributor

Joseph Myers was kind enough to provide an updated version of the documentation via email.

He says, he cannot guarantee though the documentation reflects the actual implementation by 100%.

ColdFire and m68k TLS and NPTL ABI version 0.3
==============================================

For background reading on TLS, see Ulrich Drepper's document
<http://people.redhat.com/drepper/tls.pdf>.

Design choices
--------------

* There are no spare registers available to designate as the thread
  register.  Therefore, kernel magic is needed to obtain the thread
  pointer from userspace.  Kernel helpers are provided in a vDSO since
  they will need unwind information associated; see details below.
  Compiler-generated code will use an ABI-defined function
  __m68k_read_tp with that function handling the details of calling
  the vDSO.

* Use TLS variant I (TLS_DTV_AT_TP in glibc terms), where the TLS data
  goes after the TCB.

* The thread pointer points to 0x7000 (the value of TLS_TCB_OFFSET in
  glibc) after the start of the TLS data areas, as on Power and MIPS.
  This makes a greater amount of the data accessible with signed
  16-bit offsets from the thread pointer than with an unbiased
  pointer.  (0x7000 is used instead of 0x8000 so that the TCB can also
  be accessed with 16-bit offsets from the thread pointer.)

* The DTP for a module points to 0x8000 (the value of TLS_DTV_OFFSET
  in glibc) after the start of the TLS data for that module, as on
  Power and MIPS.

* There are no linker optimizations to convert one TLS model into
  another; as such, the compiler can rearrange and optimize the
  instruction sequences shown.  The relocations can be applied to
  extension words in many different instructions.

* The __tls_get_addr function is

  typedef struct {
    unsigned long int ti_module;
    unsigned long int ti_offset;
  } tls_index;

  extern void *__tls_get_addr (tls_index *ti);

* All the static relocations for offsets from GOT, DTP or TP are
  defined in 8-bit, 16-bit and 32-bit forms, similarly to existing
  m68k/ColdFire relocations.  Both the 16-bit and 32-bit forms are
  likely to be of use in compiler-generated code.

Kernel helpers
--------------

This TLS ABI defines a function __m68k_read_tp, provided by libc.
This returns the thread pointer in register a0 (not d0) and may
clobber other call-clobbered registers.  The compiler will generate
calls to this function for the initial exec and local exec models.

To implement this function and other requirements for NPTL, four
kernel helpers are to be provided in a vDSO (as provided by the kernel
on Power and other architectures).  The symbols indicated are exported
at symbol version LINUX_2.6.  Full DWARF unwind information for all
these functions must be included in the vDSO, as thread cancellation
may need to unwind from any point in any of these functions.  The
kernel informs glibc of the location of the vDSO by putting an
AT_SYSINFO_EHDR entry in the auxiliary vector passed to each process.
If glibc is configured for a subset of processors where the necessary
operations do not require a kernel helper, then it does not need to
use the kernel helper (for example, glibc configured only for m68k
processors with a cas instruction does not need to use the
compare-and-exchange helper), but the kernel must provide all these
helpers on all m68k and ColdFire processors so that
lowest-common-denominator glibc binaries can work across all
processors.

The helper __kernel_read_tp returns the thread pointer in register a0
(not d0) and may clobber other call-clobbered registers.  (Because it
is only called from __m68k_read_tp, which is called through the PLT,
and the resolver may clobber call-clobbered registers, there seems to
be no advantage in restricting clobbers from this helper.)

Beyond the helper required for TLS, three further kernel helpers are
proposed for NPTL implementation: one to provide an atomic
compare-and-exchange operation (not available directly in the ColdFire
instruction set), one to provide a memory barrier (which can just
return to the user for non-SMP) and one to set the thread pointer.

The helper __kernel_atomic_cmpxchg_32 compares the 32-bit value at the
location pointed to by a0 with the value in d0.  If the values are
equal, it writes the value in d1 to the location pointed to by a0;
otherwise, it writes the value at the location pointed to by a0 to d0.
It does not clobber any registers other than the condition codes (and
the modification of d0 indicated so that d0 is returned with the
original value of the memory location in all cases).  (On m68k - where
this kernel helper would only be used if glibc is built for the
intersection of ColdFire and m68k - this could be implemented with a
single cas instruction and a return.)

The helper __kernel_atomic_barrier provides a memory barrier.  It does
not clobber any registers other than the condition codes.  On non-SMP,
it can just return to the user; on SMP it needs to ensure memory
synchronization between processors.

***** being removed
The helper __kernel_write_tp sets the thread pointer to the value in
a0.  It does not clobber any registers other than the condition codes.

Offset length issues
--------------------

On ColdFire (and m68k before 68020), only 16-bit offsets can be used
in memory addresses.  On m68k (68020 and later), 32-bit offsets can be
used; a ".w" assembly suffix is used for 16-bit offsets, and otherwise
the offsets are 32 bits.

The use of 16-bit offsets limits GOT size to 8192 entries (the
toolchain does not use negative GOT offsets on m68k/ColdFire).  On
m68k (68020 and later), GCC uses 32-offsets with -fPIC and 16-bit
offsets with -fpic (and does not need to use GOT accesses for non-PIC
code at present).

The proposals here do not address GOT size limitations, although an
example is given to illustrate a possible longer access sequence to
avoid those limitations on ColdFire.  The examples using offsets such
as #x@TLSGD in GOT accesses are shown for ColdFire and use the 16-bit
relocations shown.  For m68k (68020 and later), either the syntax
shown may be used, with a 32-bit relocation, or a ".w" suffix may be
used, with a 16-bit relocation.  It is proposed that the compiler, on
m68k (68020 and later), will use ".w" for -fpic and the 32-bit offsets
otherwise.  (No specific option is proposed to choose between 16-bit
and 32-bit offsets for the non-PIC, initial exec case, though such an
option could be added later.)

The same issue as for GOT accesses also applies to accesses to TLS
data using the local dynamic and local exec models.  The example code
sequences determine the address of the variable, but typically it will
be desired to read or write the variable and this may be done more
efficiently using offset addressing.  It is proposed that by default
the compiler will require the relevant TLS area to be accessible using
16-bit offsets, and that an option -mxtls must be used when compiling
objects that use the local dynamic or local exec models and will be
linked into a module with too large a TLS area for 16-bit offset
addressing.

Conventions
-----------

In the instruction sequences shown below, a5 is used to refer to the
GOT pointer (which must already have been loaded).  Apart from the
ABI-defined registers used for thread-pointer return (a0) and
__tls_get_addr return (a0), other registers may be used where
convenient.

The relocations shown on instructions are to be understood to be
applied to the extension word or words of those instructions.

Code sequences are shown in the form:

instruction                   relocation          against variable

General Dynamic TLS model
-------------------------

Code sequence:

pea x@TLSGD(%a5)              R_68K_TLS_GD16      x
jbsr __tls_get_addr

Outstanding relocations:

GOT[n]                        R_68K_TLS_DTPMOD32  x
GOT[n+1]                      R_68K_TLS_DTPREL32  x

The R_68K_TLS_GD16 relocation causes the static linker to allocate two
consecutive GOT entries for a tls_index structure and apply the
indicated relocations to them.  The dynamic linker fills in those
entries at runtime.  The code sequence leaves the address of x in a0.

On ColdFire, the example code sequence is limited to a 16-bit GOT
offset, as discussed above.  If a larger GOT is required on ColdFire,
a longer instruction sequence must be used; for example:

move.l %a5,%a0
add.l #x@TLSGD,%a0            R_68K_TLS_GD32      x
pea (%a0)
jbsr __tls_get_addr

Local Dynamic TLS model
-----------------------

Code sequence:

pea x@TLSLDM(%a5)             R_68K_TLS_LDM16     x
jbsr __tls_get_addr
...
move.l %a0,%a1
add.l #x1@TLSLDO,%a1          R_68K_TLS_LDO32     x1

Outstanding relocations:

GOT[n]                        R_68K_TLS_DTPMOD32  x

The R_68K_TLS_LDM16 relocation causes the static linker to allocate
two consecutive GOT entries for a tls_index structure and apply the
indicated relocation to the first; the second has a value of 0 and no
relocation.  The dynamic linker fills in those entries at runtime.
The first part of the code sequence leaves the address of the TLS
block for the current module (biased by 0x8000 as discussed above) in
%a0.  The second part of the code sequence determines the address of
x1 based on the address of the TLS block; the static linker resolves
R_68K_TLS_LDO32 to the correct offset from the (biased) DTP value.
Other code sequences may be used to access the value of x1 rather than
computing its address, possibly with R_68K_TLS_LDO16 relocations
depending on whether the size of the TLS area for this module is known
to be at most 64k.

Note that the local dynamic model is generally only beneficial if a
function is accessing more than one TLS variable with this model and
so can reuse the TLS block address.

The same comments about GOT size apply as for the general dynamic
model.

Initial Exec TLS model
----------------------

Code sequence:

jbsr __m68k_read_tp
...
move.l #x@TLSIE(%a5),%a1      R_68K_TLS_IE16      x
add.l %a0,%a1

Outstanding relocations (apart from those associated with calling
__m68k_read_tp through the PLT):

GOT[n]                        R_68K_TLS_TPREL32   x

The jbsr instruction loads the thread pointer into a0.  This may be
reused for each variable accessed with this model.  Each
R_68K_TLS_IE16 relocation causes the allocation of a single GOT entry
with the indicated relocation; this GOT entry is set up by the dynamic
linker with the offset for that TLS variable relative to the (biased)
thread pointer.  The second part of the code sequence loads this
offset from the GOT and adds the thread pointer to put the address of
x in a1.

The same comments about GOT size apply as for the general dynamic
and local dynamic models.

Local Exec TLS model
--------------------

Code sequence:

jbsr __m68k_read_tp
...
move.l %a0,%a1
add.l #x@TLSLE,%a1            R_68K_TLS_LE32      x

No outstanding relocations (apart from those associated with calling
__m68k_read_tp through the PLT).

The jbsr instruction loads the thread pointer into a0.  This may be
reused for each variable accessed with this model or the initial exec
model.  The R_68K_TLS_LE32 relocation is resolved by the static linker
to the offset of x relative to the (biased) thread pointer.  The
second part of the code sequence puts the address of x in a1.  Other
code sequences may be used to access the value of x rather than
computing its address, possibly with R_68K_TLS_LE16 relocations
depending on whether all of the TLS area for the executable is known
to be within 32k of the thread pointer.

Debug information
-----------------

DWARF-2 sequence:

DW_OP_addr
.long x@TLSLDO+0x8000         R_68K_TLS_LDO32     x
DW_OP_GNU_push_tls_address

No outstanding relocations.

The static linker resolves the relocation and offset to put the
unbiased address of x relative to the TLS block for its module in the
word of debug information.  GDB then uses this to locate the variable
at debug time.

ELF relocations
---------------

Static relocations:

#define R_68K_TLS_GD32      25
#define R_68K_TLS_GD16      26
#define R_68K_TLS_GD8       27
#define R_68K_TLS_LDM32     28
#define R_68K_TLS_LDM16     29
#define R_68K_TLS_LDM8      30
#define R_68K_TLS_LDO32     31
#define R_68K_TLS_LDO16     32
#define R_68K_TLS_LDO8      33
#define R_68K_TLS_IE32      34
#define R_68K_TLS_IE16      35
#define R_68K_TLS_IE8       36
#define R_68K_TLS_LE32      37
#define R_68K_TLS_LE16      38
#define R_68K_TLS_LE8       39

Dynamic relocations:

#define R_68K_TLS_DTPMOD32  40
#define R_68K_TLS_DTPREL32  41
#define R_68K_TLS_TPREL32   42

@0x59616e
Copy link
Contributor Author

0x59616e commented Feb 1, 2023

I have a queation: what is the difference between jsr and bsr ?

They seems to have almost the same functionality, and gcc use bsr to jump to the __tls_get_addr entry in the PLT.

Here is the description respectively:

  • jsr

Pushes the long-word address of the instruction immediately following the
JSR instruction onto the system stack. Program execution then continues at the
address specified in the instruction.

  • bsr

Pushes the long-word address of the instruction immediately following the
BSR instruction onto the system stack. The program counter contains the address of
the instruction word plus two. Program execution then continues at location (PC) +
displacement. The displacement is a twos complement integer that represents the
relative distance in bytes from the current program counter to the destination program
counter.

There is a nuanced difference: The description of bsr goes out of its way to articulate the address of the program counter during the execution:

The program counter contains the address of
the instruction word plus two

I guess this may affect the correctness of the pc-relative relocation ? I mean, if we use jsr, the relocation may go wrong ?

@alexkazik
Copy link

alexkazik commented Feb 1, 2023

jsr calls a subroutine at an absolute (32-bit) address while bsr calls it relative to PC (either 8- or 16-bit offset).

https://mrjester.hapisan.com/04_MC68/Sect05Part06/Index.html
(The jsr in this document lists only 24 bit jsr, but that's wrong: the 00 is part of the address, just never used on 68000 since it has only 24 bit of address lines.)

@mikrosk
Copy link

mikrosk commented Feb 1, 2023

It is also worth mentioning that 68020+ offers also the bcc.l variant, including bsr.l, i.e. with the full 32-bit offset.

As for the relocation, to have a truly relocatable code you can't use jsr/jmp. However Atari TOS does relocate even absolute code but of course it requires a table for each absolute address in the executable, which is part of the format specification.

@jjuran
Copy link

jjuran commented Feb 2, 2023

While otherwise semantically identical, JSR supports more addressing modes than BSR, including absolute (both long and short), and displaced and indexed modes relative to either the PC or any address register.

@0x59616e 0x59616e self-assigned this Feb 3, 2023
@0x59616e
Copy link
Contributor Author

0x59616e commented Feb 3, 2023

I have a dumb question:

  • The DTP for a module points to 0x8000

What does DTP mean here ?

@0x59616e
Copy link
Contributor Author

0x59616e commented Feb 5, 2023

Candidate patch: D143317.

@glaubitz
Copy link
Contributor

glaubitz commented Feb 5, 2023

Nice, that was really fast!

PS: I have asked the author of the TLS ABI document for the explanation of what DTP means, but I haven't heard back yet.

@glaubitz
Copy link
Contributor

glaubitz commented Feb 6, 2023

I have a dumb question:

  • The DTP for a module points to 0x8000

What does DTP mean here ?

According to Joseph Myers, DTP stands for "Dynamic Thread Pointer" which is relevant for the linker only.

@0x59616e
Copy link
Contributor Author

Candidate patch: D144941

@0x59616e
Copy link
Contributor Author

0x59616e commented Jun 3, 2023

close by 4c2ec08

@0x59616e 0x59616e closed this as completed Jun 3, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Passing LLVM Test Suite Jun 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants