Skip to content

Commit f05e712

Browse files
authored
enable some function by default (#148)
1 parent 0188361 commit f05e712

File tree

2 files changed

+15
-27
lines changed

2 files changed

+15
-27
lines changed

README.md

Lines changed: 10 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ For Dynamic Axial Parallelism, you can refer to `./inference.py`. Here is an exa
123123

124124
```shell
125125
python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
126-
--output_dir ./ \
126+
--output_dir .outputs/ \
127127
--gpus 2 \
128128
--uniref90_database_path data/uniref90/uniref90.fasta \
129129
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa \
@@ -133,44 +133,28 @@ python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
133133
--jackhmmer_binary_path `which jackhmmer` \
134134
--hhblits_binary_path `which hhblits` \
135135
--hhsearch_binary_path `which hhsearch` \
136-
--kalign_binary_path `which kalign`
136+
--kalign_binary_path `which kalign` \
137+
--enable_workflow \
138+
--inplace
137139
```
138140
or run the script `./inference.sh`, you can change the parameter in the script, especisally those data path.
139141
```shell
140142
./inference.sh
141143
```
142144

143-
#### inference with data workflow
144-
Alphafold's data pre-processing takes a lot of time, so we speed up the data pre-process by [ray](https://docs.ray.io/en/latest/workflows/concepts.html) workflow, which achieves a 3x times faster speed. To run the inference with ray workflow, you should install the package and add parameter `--enable_workflow` to cmdline or shell script `./inference.sh`
145-
```shell
146-
pip install ray==2.0.0 pyarrow
147-
```
148-
```shell
149-
python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
150-
--output_dir ./ \
151-
--gpus 2 \
152-
--uniref90_database_path data/uniref90/uniref90.fasta \
153-
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa \
154-
--pdb70_database_path data/pdb70/pdb70 \
155-
--uniref30_database_path data/uniref30/UniRef30_2021_03 \
156-
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
157-
--jackhmmer_binary_path `which jackhmmer` \
158-
--hhblits_binary_path `which hhblits` \
159-
--hhsearch_binary_path `which hhsearch` \
160-
--kalign_binary_path `which kalign` \
161-
--enable_workflow
162-
```
145+
Alphafold's data pre-processing takes a lot of time, so we speed up the data pre-process by [ray](https://docs.ray.io/en/latest/workflows/concepts.html) workflow, which achieves a 3x times faster speed. To run the inference with ray workflow, we add parameter `--enable_workflow` by default.
146+
To reduce memory usage of embedding presentations, we also add parameter `--inplace` to share memory by defaul.
163147

164148
#### inference with lower memory usage
165149
Alphafold's embedding presentations take up a lot of memory as the sequence length increases. To reduce memory usage,
166-
you should add parameter `--chunk_size [N]` and `--inplace` to cmdline or shell script `./inference.sh`.
150+
you should add parameter `--chunk_size [N]` to cmdline or shell script `./inference.sh`.
167151
The smaller you set N, the less memory will be used, but it will affect the speed. We can inference
168152
a sequence of length 10000 in bf16 with 61GB memory on a Nvidia A100(80GB). For fp32, the max length is 8000.
169153
> You need to set `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:15000` to inference such an extreme long sequence.
170154
171155
```shell
172156
python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
173-
--output_dir ./ \
157+
--output_dir .outputs/ \
174158
--gpus 2 \
175159
--uniref90_database_path data/uniref90/uniref90.fasta \
176160
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa \
@@ -181,8 +165,9 @@ python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
181165
--hhblits_binary_path `which hhblits` \
182166
--hhsearch_binary_path `which hhsearch` \
183167
--kalign_binary_path `which kalign` \
184-
--chunk_size N \
168+
--enable_workflow \
185169
--inplace
170+
--chunk_size N \
186171
```
187172

188173
#### inference multimer sequence

inference.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
# add '--inplace' to use inplace to save memory
66

77
python inference.py target.fasta data/pdb_mmcif/mmcif_files \
8-
--output_dir ./ \
8+
--output_dir ./outputs \
9+
--gpus 2 \
910
--uniref90_database_path data/uniref90/uniref90.fasta \
1011
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa \
1112
--pdb70_database_path data/pdb70/pdb70 \
@@ -14,4 +15,6 @@ python inference.py target.fasta data/pdb_mmcif/mmcif_files \
1415
--jackhmmer_binary_path `which jackhmmer` \
1516
--hhblits_binary_path `which hhblits` \
1617
--hhsearch_binary_path `which hhsearch` \
17-
--kalign_binary_path `which kalign`
18+
--kalign_binary_path `which kalign` \
19+
--enable_workflow \
20+
--inplace

0 commit comments

Comments
 (0)