Skip to content

Commit 6722657

Browse files
authored
add basic AC configs for 13B and 70B (#169)
as titled, currently 13B use selective op, and 70B use selective layer, we can do some more experiments and adjust the configs later
1 parent e28832e commit 6722657

File tree

2 files changed

+10
-2
lines changed

2 files changed

+10
-2
lines changed

train_configs/llama_13b.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ seq_len = 4096
3030
warmup_steps = 200 # lr scheduler warm up
3131
max_norm = 1.0 # grad norm clipping
3232
steps = 1000
33-
# only dp would be sufficient for 7B
3433
data_parallel_degree = -1
3534
# 8-way TP, adjust to 2/4 for local(single host) runs
3635
tensor_parallel_degree = 8
@@ -41,3 +40,8 @@ checkpoint_interval = 3600
4140
checkpoint_interval_type = "steps"
4241
checkpoint_folder = ""
4342
dataset = "openwebtext"
43+
44+
45+
[activation_checkpoint]
46+
mode = 'selective' # ['none', 'full', 'selective']
47+
selective_ac_option = 'op' # 'int' = ac every positive int layer or 'op', ac based on ops policy

train_configs/llama_70b.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ seq_len = 4096
3030
warmup_steps = 200 # lr scheduler warm up
3131
max_norm = 1.0 # grad norm clipping
3232
steps = 1000
33-
# only dp would be sufficient for 7B
3433
data_parallel_degree = -1
3534
# 8-way TP
3635
tensor_parallel_degree = 8
@@ -41,3 +40,8 @@ checkpoint_interval = 3600
4140
checkpoint_interval_type = "steps"
4241
checkpoint_folder = ""
4342
dataset = "openwebtext"
43+
44+
45+
[activation_checkpoint]
46+
mode = 'selective' # ['none', 'full', 'selective']
47+
selective_ac_option = '2' # 'int' = ac every positive int layer or 'op', ac based on ops policy

0 commit comments

Comments
 (0)