Introduction

this project is to provide a tool to extract data item from patent materials

Usage

python3 -m pip install -r requirements.txt

python3 main.py --input_dir <path/to/directory/of/patents> [--output_dir <path/to/output/directory>] [--ckpt <path/to/customized/ckpt>]

to improve the LLM on ability to extract electrolyte related information, we use supervised finetuning to moderate pretrain LLM's behavior.

python3 create_dataset.py --input datasets/origin.json --output datasets/trainset.jsonl

python3 finetune.py --pretrained_ckpt <hugging/face/model/id> --sft_ckpt <path/to/ckpt> --dataset <path/to/dataset> --device (cuda|cpu)

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
datasets		datasets
test		test
.gitattributes		.gitattributes
README.md		README.md
chains.py		chains.py
config.py		config.py
create_dataset.py		create_dataset.py
finetune.py		finetune.py
main.py		main.py
models.py		models.py
prompts.py		prompts.py
requirements.txt		requirements.txt