Add new example for section 5 #7

idoleat · 2024-04-17T10:10:55Z

A new directory /examples is added for example codes. .clang-format is copied from sysprog21/lkmpg project.

This draft PR provides a simplified implementation of thread pool. After initializing the thread pool with thread count, jobs can be added. The job queue is a SPMC ring buffer. To keep the implementation minimal, the producer is not protecting, resulting the thread pool can not run automatically when jobs are added. Or the worker may try to get the job before it is fully enqueued.

Padding is added in thread_pool_t to avoid false sharing. The number "40" is the sum of size of struct members, including alignment, before the first padding. There should be a better way to determine the value since structure packing is implementation defined.

The test in main function results non-determinate order of jobs echoing its id. A mechanism to wait all jobs to complete should be added later instead of using sleep. thread_pool_destroyed() is not functional yet.

To explain Exchange, Test and set, Fetch and ... and Compare and swap in section 5.1~5.4 using this example, the following issues should be resolved:

Exchange is not in use.
Test and set on the initialized flag is useless. Testing the flag on the first thread pool initialization retrieve non-determinate value. Currently re-initialization is still possible. If preventing re-initialization is not crucial in this example, we need to find another way to use test and set.
Using Fetch and add to change thread pool state seems too intentional.
Pick a better job type, such as BBP PI approximation.
Finish implementation mentioned in the above paragraph.

Should we break the example into pieces to explain individually? Or list the code first?

examples/ch5_example.c

idoleat · 2024-04-20T11:06:17Z

The job queue exhibits incorrect behavior and is currently under rework. To reproduce the incorrect behavior, add new jobs after finishing the current ones. A segmentation fault occurs simply because thrd_pool->tail is an independent variable, not a pointer to the real tail. It still holds the old address when new jobs are added. I overthought the situation, assuming that prev in job_t would be constantly changed, and thus made it a distinct struct member, even with padding.

idoleat · 2024-04-21T18:28:13Z

the following issues should be resolved:

Exchange is not in use.

Test and set on the initialized flag is useless. Testing the flag on the first thread pool initialization retrieve non-determinate value. Currently re-initialization is still possible. If preventing re-initialization is not crucial in this example, we need to find another way to use test and set.

Using Fetch and add to change thread pool state seems too intentional.

Solved.

Operations on atomic types, including flag type, covered in C11 standard 7.17.7 and 7.17.8 are all used in the example code. Next I will start revising section 5 based on this example. The undesired result caused by not using atomic operations will be provided as well. Also clarifying more on why we need read-modify-write as a atomic step.

examples/ch5_example.c

concurrency-primer.tex

idoleat · 2024-05-30T18:22:33Z

concurrency-primer.tex

-
-Relaxed operations are also beneficial for managing flags shared between threads.
-For example, a thread might continuously run until it receives a signal to exit:
+Relaxed operations are beneficial for managing flags shared between threads.


Should we mention the discussion around relaxed atomics?
like this one: https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

weihsinyeh · 2024-06-21T11:59:00Z

To demonstrate the application of Test and Set in rmw_example.c:

As stated in the Concurrency Primer (§ 5.2) regarding Test and Set: "We could use Test and Set to build a simple spinlock."

In rmw_example.c, Compare and Swap is utilized to avoid race conditions. However, the Test and Set operation using atomic_flag_test_and_set() is only employed for initialization, which underutilizes its locking properties to avoid race condition.

To illustrate the distinct applications of Test and Set versus Compare and Swap:
Test and Set : If the Test operation fails, it will spin.
Compare and Swap : When a Compare operation fails, it will continue to execute.

However, to discuss this in rmw_example.c, another shared resource must be introduced. Currently, the shared resource is a thrd_pool.

Consider a scenario where multiple workers all need to access the same shared variable to perform operations. In this case, mutex can be utilized to provide the functionality similar to a Test and Set flag.

In rmw_example.c, "Exchange, Test and Set, Fetch and ..., and Compare and Swap" performs all four operations using atomic operations from stdatomic.h. However, as mentioned in Chapter 1: "System programmers are familiar with tools such as mutexes, semaphores, and condition variables. Nevertheless, a question remains: How do these tools function, and how can we write concurrent code in their absence?"

Therefore, it is not mandatory to use atomic_flag_test_and_set() for Test and Set. Mutex can also achieve locking and unlocking effects.

weihsinyeh · 2024-06-21T12:47:38Z

Another idea for Chapter 5 is to retain the original structure, keeping the descriptions of concepts in sections 5.1, 5.2, 5.3, and 5.4. However, the original examples are not intuitive because all of these sections are actually related to RMW operations. It's important for readers to understand that they only differ in techniques, aimed at being applicable to different scenarios. At the same time, specifically explain where exactly the need for atomic operations arises.

Implementing RMW does not necessarily require tools like those found in stdatomic.h.

In a subsequent new subsection 5.5, further explanation will detail the atomic functions available to implement these four concepts. In rmw_example.c, demonstrate their application in specific scenarios.

jserv

Rebase the latest main branch for reviewing.

A new directory /examples is added for example codes. .clang-format is copied from sysprog21/lkmpg project. This commit provides a simplified implementation of thread pool. After initializing the thread pool with thread count, jobs can be added. The job queue is a SPMC ring buffer. To keep the implementation minimal, the producer is not protecting, resulting the thread pool can not run automatically when jobs are added. Or the worker may try to get the job before it is fully enqueued. Padding is added in thread_pool_t to avoid false sharing. The number "40" is the sum of size of struct members, including alignment, before the first padding. There should be a better way to determine the value since structure packing is implementation defined. The test in main function results nondeterminate order of jobs echoing its id. A mechanism to wait all jobs to complete should be added later instead of using sleep. `thread_pool_destroyed()` is not functional yet.

Codes executed under running condition are placed in the same scope.

Both next and prev in job_t are struct job, thus residing them in the same line.

An assert is added before malloc. Also type of size to changed to size_t.

Two macros `CAST_JOB(job, type)` and `PREV_JOB(job)` are added to simplify long expressoins to improve readability.

A new struct `idle_job` is added. There are several ways to have atomic `thrd_pool->head->prev` (the original tail): (1) _Atomic used as either specifier or qualifier in C11 acts on object, not region. So we can not have only the idle `job_t` have atomic `prev`. All job_t would have atomic `prev`. However _Atomic is only allowed to act on complete type, meaning that `_Atomic(struct job *)` and `_Atomic(void *)` are not allowed in the declaration of `job_t`. `atomic_uintptr` has already shown enough casting chaos in the previous commit. (2) Embed `job_t` in a new struct `idle_job` along side `_Atomic(job_t *) prev;`. In worker function, the last job is accessed through `thrd_pool->head->prev`, the same as (1). The only difference is how the idle job is initialized and how the first job is added. Padding could also be added around `prev` to avoid false sharing. Test in main demonstrates a series of jobs are added after fininshing existing ones. `thread_pool_destroy` is implemented to cancel and free the pool. Additional sleep could be added before detroy to observ the second series of jobs. Notice that freeing memory of job in worker directly after using it may cause dangling pointer in other threads. Safe memory reclaimation should be introduced to avoid this completely. Or use memory pool for jobs.

Atomic flag is used for checking if the given thread pool has been initialized. The flag is initialized when thread pool struct declared and reseted to false when thread pool destroyed. Atomic exchange obtains previous state when destroying thread pool and gives warnning message if the state were running. Atomic fetch and AND with zero demonstrate a way to set state to idle.

Both job_count and thread_count were meant to be constant in the given test scenario. Thus they were specified as macros instead of variables.

Clarification on read-modify-write is first added at the beginning of the section. To make the discussion based on atomic load/store, more information is supplemented at the end of section 2. Example code is included by using minted package. Each subsection is revised according to the atomic library usage in the example. At the end of sectoin 5, a new subsection "further improvements" is added to discuss topics on leveraging other memory order, false sharing and safe memory reclamation. First two topics are forward referenced to according chapters. The last one is not covered in this book so it has no reference.

rmw_cample.c is committed in the last commit.

A diff file is added to patch original example to the one that can cause races. Substituting `threads.h` with `pthread.h` is also included in diff because sanitizer hasn't support C11 thread yet. How sanitizer works and how to use it are added as well. Then explainations on warning messages from Tsan are followed. The part mentioning safe memory reclaimation is moved to this subsection because warning messages from Tsan mentioned it. A missing reference to spinlock (originally as one of the rmw examples) in section 9 was added back as a new code block.

The original statement is only true on the successful operation. To other failed operations, it is the successful one that finished before failed ones. The new statement does not totally cover the charateristic of atomic operations though. It is the generated cmpxchg or LL/SC loop that make the operation keeps trying and eventually finish. But considering that the purpose of this paragraph is to plot a big picture of order and atomicity, more details on atomic operations should be covered in section "Atomic as building blocks". More refences to compiler and CPU menufacturer documents should be taken in consideration then.

The intro of section 10 originally references back to spinlock in section 5. It is now replace by using the new example. Section 10.2 onriginally references back to UI thread in section 5. It is now removed as new example presents down below to explain relaxed memory model. New example is used here as well is because it is exactly what original example was talking about.

Static linkage is added for better practice. A new inline function wait_until is added to serve the need of waiting thread pool until give state, thus removing sleep() and corresponding header. The weak version of compare and swap is used instead due to 1. There is really no other thing in the same cache line to cause spurious fail. 2. The retry cost is considered lower than nested loop

1. Use Bailey–Borwein–Plouffe formula to approximate PI - Reference : https://github.com/sysprog21/concurrent-programs/blob/master/tpool/tpool.c 2. Add Add PRECISION constant with value 100

1. Add the tpool_future variable - tpool_future to pass the result to the main thread. - The mutex lock and the condition variable to ensure concurrency. 2. The main thread sequentially accumulate results from BBP that calcuate by every worker. - Wait using `tpool_future_get()` until the condition variable is broadcast to confirm that the result has been marked as __FUTURE_FINISHED. 3. Change `thread_pool` to `tpool` to improve readability. 4. Add the Makefile.

1. Directly show the scenarios using Test and Set and its atomic operations. - Use `atomic_flag_test_and_set()` and `atomic_flag_clear()` to implement the original mutex lock and unlock mechanism. - Replace the original condition variable wait mechanism with `atomic_flag_test_and_set()` combined with a `while` loop. 2. Avoid deadlock in `tpool_future_get()`. - The main thread must first wait for the worker to complete the "BBP formula" job. - Subsequently, it should wait for the worker to unlock. - These two operations must occur in this order to avoid deadlock. Swapping them will lead to deadlock.

1. Check if `future->result` is NULL. - If `future->result` is NULL, the job is still in progress. - If `future->result` is not NULL, the job has been completed by the worker.

1. When allocating memory for future, if the allocation fails, do not simply return NULL. Instead, release the memory allocated for job beforehand to avoid memory leaks.

1. When creating the future, set the future's flag, which is akin to assigning the job. Afterward, transfer the ownership to the worker. Once the worker completes the job, clear the flag and return the ownership, which is akin to submitting a job. Then, the main thread can regain ownership. By doing this, the main thread can wait directly for the result through test and set without checking if the result is NULL. This avoids the situation where the flag could be set to true by the main thread before the worker starts the job. Additionally, the worker does not need to check with test and set before performing the job. 2. Drop the `atomic_flag_clear` in `tpool_future_wait` function and then directly free the pointer of future and its result in `tpool_future_destroy` function. 3. Rename the variable 'lock' in the future structure to 'flag'. Rename the function name `tpool_future_get` to `tpool_future_wait`. Co-authored-by: Chih-Wei Chien <[email protected]> Signed-off-by: Wei-Hsin Yeh <[email protected]>

1. When allocating memory for the product, if the allocation fails, it returns NULL. Co-authored-by: Chih-Wei Chien <[email protected]> Signed-off-by: Wei-Hsin Yeh <[email protected]>

concurrency-primer.tex

1. Use 2 figures to connect concepts from the first 3 sections. - Figure atomic_rmw illustrates that atomic operations consist of not only a single operation but a group of operations that need to perform atomically. - Figure rmw_communicate shows how this atomic group of operations can be used on shared resource for communication. 2. Discuss how to ensure the operations of accessing the shared resource for communication between concurrent threads are correct: - Use Test and Set and Compare and Swap as examples to illustrate how this can be achieved. 3. Compare the usage scenarios of Exchange and Fetch and ... 4. Introduce the concept that we can utilize atomic operations to ensure that a group of operations can perform atomically.

Introducing thread sanitizer here may be an unexpected pop up for the readers that are new to concurrency. Here we focus on rmw atomic operation instead, thus related content and diff file are removed. The proper place for this topic could be a dedicated section for "testing, debugging and verifing concurrent programs" This aligns the decision sticking to C11 thread as well. Co-authored-by: Wei-Hsin Yeh <[email protected]>

Since spinlock is added back in section 5.2, the original content is restored. Same as the rmw example, the goal is to provide easy to understand example first and improve it later on.

concurrency-primer.tex

References to C11 standard were added when explaining properties of atomic type and operations. More information of codegen on atomic operations is added as a footnote with a link to LLVM's document as an example.

examples/rmw_example.c

Add the description of atomic instruction to let readers know there is a difference between using fetch and..., which is only a programming tool, and its actual execution as an atomic operation that depends on the compiler. Simplify the rmw_example code to provide more flexible examples. - Initially, all worker threads will be initialized. The main thread will ask all workers to start running. If there is no job or the job is completed, the worker will become idle. Next, the main thread will continue to add more jobs and ask the worker to start running again. Meanwhile, the main thread will also wait for the results of the work. - Use the struct `tpool_future` to record all the information required for the job. Co-authored-by: Chih-Wei Chien <[email protected]>

jserv · 2024-07-22T03:54:25Z

Thank @idoleat for contributing!

jserv reviewed Apr 17, 2024

View reviewed changes

examples/ch5_example.c Outdated Show resolved Hide resolved

jserv reviewed Apr 17, 2024

View reviewed changes

examples/ch5_example.c Outdated Show resolved Hide resolved

jserv changed the title ~~Add new example for chapter 5~~ Add new example for section 5 Apr 17, 2024

jserv reviewed Apr 17, 2024

View reviewed changes

examples/ch5_example.c Outdated Show resolved Hide resolved

jserv reviewed Apr 17, 2024

View reviewed changes

examples/ch5_example.c Outdated Show resolved Hide resolved

jserv reviewed Apr 17, 2024

View reviewed changes

examples/ch5_example.c Outdated Show resolved Hide resolved

jserv reviewed Apr 21, 2024

View reviewed changes

examples/ch5_example.c Outdated Show resolved Hide resolved

idoleat commented May 3, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

idoleat commented May 3, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

idoleat commented May 3, 2024

View reviewed changes

concurrency-primer.tex Show resolved Hide resolved

idoleat marked this pull request as ready for review May 3, 2024 10:56

idoleat commented May 3, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

idoleat commented May 15, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

idoleat commented May 30, 2024

View reviewed changes

idoleat mentioned this pull request May 31, 2024

Experimenting HTML build with mdbook #10

Open

idoleat mentioned this pull request Jun 16, 2024

Add ABA problem subsection in "Atomic operations as building blocks" #11

Merged

weihsinyeh force-pushed the ch5-examples branch 2 times, most recently from 5490f3d to bd9ce93 Compare June 23, 2024 16:31

jserv requested changes Jun 25, 2024

View reviewed changes

idoleat added 7 commits June 25, 2024 14:12

Refactor worker thread conditional code path

6abc1ea

Codes executed under running condition are placed in the same scope.

Combine identifier declarations in one line

6949475

Both next and prev in job_t are struct job, thus residing them in the same line.

Add assert to avoid the case size equals zero

b1b3431

An assert is added before malloc. Also type of size to changed to size_t.

Shorten else if as an if

fabf742

Replace long expressions with macros

aefdd16

Two macros `CAST_JOB(job, type)` and `PREV_JOB(job)` are added to simplify long expressoins to improve readability.

idoleat and others added 16 commits June 25, 2024 14:12

Define job count and thread count as constant

928c1ec

Both job_count and thread_count were meant to be constant in the given test scenario. Thus they were specified as macros instead of variables.

Rename ch5_example.c to rmw_example.c

c88b4dc

rmw_cample.c is committed in the last commit.

Remove unneeded comment

1ad39bb

Add BBP formula to approximate PI

138421e

1. Use Bailey–Borwein–Plouffe formula to approximate PI - Reference : https://github.com/sysprog21/concurrent-programs/blob/master/tpool/tpool.c 2. Add Add PRECISION constant with value 100

Check if job is finished by the result of future

d9dd152

1. Check if `future->result` is NULL. - If `future->result` is NULL, the job is still in progress. - If `future->result` is not NULL, the job has been completed by the worker.

Avoid the memory leak in the add_job function

5a060d3

1. When allocating memory for future, if the allocation fails, do not simply return NULL. Instead, release the memory allocated for job beforehand to avoid memory leaks.

Avoid memory leak in the bbp function

07e0f1c

1. When allocating memory for the product, if the allocation fails, it returns NULL. Co-authored-by: Chih-Wei Chien <[email protected]> Signed-off-by: Wei-Hsin Yeh <[email protected]>

weihsinyeh force-pushed the ch5-examples branch from 6f155bb to 07e0f1c Compare June 25, 2024 07:41

jserv reviewed Jun 25, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

weihsinyeh force-pushed the ch5-examples branch from cedb0a1 to a997d4e Compare June 26, 2024 07:27

idoleat and others added 2 commits June 26, 2024 15:56

Restore previously removed spinlock example

a6c68a9

Since spinlock is added back in section 5.2, the original content is restored. Same as the rmw example, the goal is to provide easy to understand example first and improve it later on.

jserv reviewed Jun 26, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

jserv reviewed Jun 26, 2024

View reviewed changes

concurrency-primer.tex Outdated Show resolved Hide resolved

Supplement statements with references to C11 and LLVM docs

4dc5699

References to C11 standard were added when explaining properties of atomic type and operations. More information of codegen on atomic operations is added as a footnote with a link to LLVM's document as an example.

jserv reviewed Jul 1, 2024

View reviewed changes

examples/rmw_example.c Outdated Show resolved Hide resolved

idoleat requested a review from jserv July 22, 2024 03:47

jserv merged commit 8d9a4b9 into sysprog21:main Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new example for section 5 #7

Add new example for section 5 #7

Uh oh!

idoleat commented Apr 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

idoleat commented Apr 20, 2024

Uh oh!

idoleat commented Apr 21, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

idoleat May 30, 2024 •

edited

Loading

Uh oh!

weihsinyeh commented Jun 21, 2024

Uh oh!

weihsinyeh commented Jun 21, 2024

Uh oh!

jserv left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jserv commented Jul 22, 2024

Uh oh!

Uh oh!

Add new example for section 5 #7

Add new example for section 5 #7

Uh oh!

Conversation

idoleat commented Apr 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

idoleat commented Apr 20, 2024

Uh oh!

idoleat commented Apr 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

idoleat May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weihsinyeh commented Jun 21, 2024

Uh oh!

weihsinyeh commented Jun 21, 2024

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jserv commented Jul 22, 2024

Uh oh!

Uh oh!

idoleat commented Apr 21, 2024 •

edited

Loading

idoleat May 30, 2024 •

edited

Loading