Skip to content

ddsntc1/Visual_information_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Visual Information Extraction Task ์ˆ˜ํ–‰ ๋ณด๊ณ ์„œ

์ž‘์„ฑ์ž: ๊ฐ•๋™์šฑ

๐Ÿค— Dongwooks HF-repo

๊ฒฐ๊ณผ ๋ชจ๋ธ์€ ์šฉ๋Ÿ‰์œผ๋กœ ์ธํ•ด git upload๊ฐ€ ๋ถˆ๊ฐ€ํ•˜์—ฌ huggingface์— ์—…๋กœ๋“œ ํ•˜์˜€์Šต๋‹ˆ๋‹ค

Result_Model : VIE_TASK_v5

HuggingFace Colab

๋ชฉ์ฐจ

  1. Summary
  2. Experimental Results
  3. Instructions
  4. Approach
  5. ๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ๊ณผ์ œ

1. Summary

1.1 ๊ณผ์ œ ๋ชฉํ‘œ

๋ณธ ํ”„๋กœ์ ํŠธ๋Š” ๋ฌธ์„œ ์ดํ•ด ๋ถ„์•ผ์˜ ํ•ต์‹ฌ ๊ณผ์ œ์ธ ์ •๋ณด ์ถ”์ถœ(Information Extraction) ํƒœ์Šคํฌ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ์˜์ˆ˜์ฆ ๋ฌธ์„œ์—์„œ ํšŒ์‚ฌ๋ช…, ๋‚ ์งœ, ์ฃผ์†Œ, ์ด์•ก๊ณผ ๊ฐ™์€ ํ•ต์‹ฌ ์ •๋ณด๋ฅผ ์ž๋™์œผ๋กœ ์ถ”์ถœํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์—ฌ, ๋ฌธ์„œ ์ฒ˜๋ฆฌ ์ž๋™ํ™”์˜ ์ •ํ™•์„ฑ๊ณผ ํšจ์œจ์„ฑ์„ ๊ฐ–์ถ˜ ๋ชจ๋ธ์„ ๋ชฉํ‘œ๋กœ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด ์ตœ์‹  ๋ฌธ์„œ ์ดํ•ด ๋ชจ๋ธ์ธ LayoutLMv3๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ, ํ…์ŠคํŠธ ์ •๋ณด์™€ ๊ณต๊ฐ„ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค. Sliding Window ๊ธฐ๋ฒ•์˜ ๋„์ž…๊ณผ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„ ์„ ํ†ตํ•ด, ์‹ค์ œ ์—…๋ฌด ํ™˜๊ฒฝ์—์„œ ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ F1 ์ ์ˆ˜ 84.96, ์ •ํ™•๋„(EM) 50.43์„ ๋‹ฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

1.2 ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ

  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ

    • ๋‹จ์–ด์— ๋Œ€ํ•œ ์ •๋ณด
    • normalised bounding box ์ขŒํ‘œ ์ •๋ณด
    • ์ด๋ฏธ์ง€ ์ •๋ณด
  • ์ถ”์ถœ ๋Œ€์ƒ(label) ์ •๋ณด

    • Company
    • Date
    • Address
    • Total
    • Others

1.3 ์ ‘๊ทผ ๋ฐฉ๋ฒ•

  • LayoutLMv3 ๋ชจ๋ธ Fine-tuning
  • Sliding Window ๊ธฐ๋ฒ• ์ ์šฉ
  • ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ํ’ˆ์งˆ ๊ฐœ์„ 

2. Experimental results

2.1 ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ

๋ฒ„์ „ F1 EM EM_no_space ์ฃผ์š” ๋ณ€๊ฒฝ์‚ฌํ•ญ
v1 76.7090 36.1671 36.1671 ๊ธฐ๋ณธ ๊ตฌํ˜„
v2 80.6816 50.0000 50.0000 ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ฐœ์„ , ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •
v3 80.4974 49.7839 49.7839 max_length ํ™•์žฅ ์‹œ๋„
v4 84.9540 50.9366 50.9366 ์ถ”๋ก  ์‹œ Sliding Window ์ ์šฉ
v5 84.9650 50.4323 50.4323 ํ•™์Šต ์‹œ Sliding Window ์ ์šฉ
v6 83.7073 52.0173 52.0173 ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ํ’ˆ์งˆ ๊ฐœ์„ 

2.2 ๊ฒฐ๊ณผ ๋ถ„์„

  1. ์„ฑ๋Šฅ ํ–ฅ์ƒ ์ถ”์ด

    • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ฐœ์„ ์œผ๋กœ ์ดˆ๊ธฐ F1 ์ ์ˆ˜ 76.7์—์„œ 80.6์œผ๋กœ ํ–ฅ์ƒ
    • Sliding Window ๋„์ž…์œผ๋กœ f1 score - 84.96๊นŒ์ง€ ๊ฐœ์„ 
    • ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„ ์œผ๋กœ em score - 52.01 ๋‹ฌ์„ฑ
  2. ์ฃผ์š” ๊ฐœ์„ ์ 

    • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์ตœ์ ํ™”
    • ํ† ํฐ ์ฒ˜๋ฆฌ ์ œํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ
    • ๋ผ๋ฒจ๋ง ํ’ˆ์งˆ ํ–ฅ์ƒ

3. Instructions

3.1 Hardware/Software ํ™˜๊ฒฝ

  • ๊ฐœ๋ฐœ ํ”Œ๋žซํผ: Google Colab
    • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ: CPU ํ™˜๊ฒฝ
    • ๋ชจ๋ธ ํ•™์Šต/์ถ”๋ก : GPU(L4) ํ™˜๊ฒฝ

3.2 ์ฃผ์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

- transformers==4.46.2
- torch==2.5.1+cu121
- datasets==3.1.0  
- huggingface-hub==0.26.2
- seqeval==1.2.2
- sentence-transformers==3.2.1

3.3 ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

Visual_information/extraction/
โ”œโ”€โ”€ look_data.ipynb - ๋ฐ์ดํ„ฐ ์ƒํƒœ๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ํ•ด๋‹น ๋…ธํŠธ๋ถ์„ ์ƒ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.
โ”œโ”€โ”€ making_dataset&modify_dataset.ipynb - ์ œ๊ณต๋œ *.txt ํŒŒ์ผ๊ณผ img ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ด์„œ Dataset์„ ์ƒ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ฐœ์„ ์„ ์œ„ํ•œ ์ž‘์—…์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
โ”œโ”€โ”€ model_training_main.ipynb - ๋ฐ์ดํ„ฐ ํ›ˆ๋ จ ๊ณผ์ •์— ๋Œ€ํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.
โ””โ”€โ”€ data/
    โ”œโ”€โ”€ train/
        โ””โ”€โ”€ entities/
        โ””โ”€โ”€ img/
    โ”œโ”€โ”€ test/
        โ””โ”€โ”€ entities/
        โ””โ”€โ”€ img/

    โ”œโ”€โ”€ op_test.txt
    โ”œโ”€โ”€ op_test_box.txt
    โ”œโ”€โ”€ op_test_image.txt

    โ”œโ”€โ”€ train.txt
    โ”œโ”€โ”€ train_box.txt
    โ”œโ”€โ”€ train_image.txt

    โ”œโ”€โ”€ test.txt
    โ”œโ”€โ”€ test_box.txt
    โ””โ”€โ”€ test_image.txt

3.4 making_dataset&modify_dataset.ipynb

making_dataset.ipynb

๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ ๋ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •

1. ํ™˜๊ฒฝ ์„ค์ • ๋ฐ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ

# ํ•„์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜
!pip install transformers datasets seqeval
!git lfs install

# ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์‚ฌ
def validate_data():
    for target in ['train', 'test', 'op_test']:
        # ๊ฐ ๋ฐ์ดํ„ฐ ํŒŒ์ผ ๋น„๊ต ๊ฒ€์ฆ
        # - image_path: ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ
        # - text_path: ํ…์ŠคํŠธ ๋ฐ ๋ผ๋ฒจ
        # - box_path: ์ •๊ทœํ™”๋œ ์ขŒํ‘œ

2. ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ

def create_dataset(target_type):
    """ํ•™์Šต/ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ"""
    dataset = []
    
    # ํŒŒ์ผ ๊ฒฝ๋กœ ์„ค์ •
    txt_path = f"data/{target_type}.txt"
    box_path = f"data/{target_type}_box.txt"
    image_path = f"data/{target_type}_image.txt"
    
    # ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ฐ ์ฒ˜๋ฆฌ
    - words: ํ…์ŠคํŠธ ์ •๋ณด
    - bboxes: ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ
    - norm_bboxes: ์ •๊ทœํ™”๋œ ์ขŒํ‘œ
    - labels: BIO ํƒœ๊น… ์ •๋ณด
    
    # JSON ํ˜•์‹์œผ๋กœ ์ €์žฅ
    save_to_json(f"{target_type}_dataset.json")

3. ๋ฐ์ดํ„ฐ์…‹ ํ’ˆ์งˆ ๊ฐœ์„ 

def enhance_dataset():
    # 1. ๊ณต๊ฐ„ ์ •๋ณด ํ™œ์šฉ
    def group_by_lines(words, bboxes, y_threshold=5):
        """์„ธ๋กœ ์œ„์น˜ ๊ธฐ๋ฐ˜ ๋ผ์ธ ๊ทธ๋ฃนํ™”"""
        # y์ขŒํ‘œ ๊ธฐ๋ฐ˜ ํ…์ŠคํŠธ ๋ผ์ธ ์‹๋ณ„
        # ๋™์ผ ๋ผ์ธ ๋‚ด ๋‹จ์–ด ์ •๋ ฌ
    
    # 2. ์—”ํ‹ฐํ‹ฐ๋ณ„ ๋งค์นญ ๊ทœ์น™
    def find_company_line(lines, company):
        """ํšŒ์‚ฌ๋ช… ์‹๋ณ„ ๊ทœ์น™"""
        # ๋ฌธ์„œ ์ƒ๋‹จ ์œ„์น˜
        # ํŠน์ • ํ‚ค์›Œ๋“œ ํ™œ์šฉ
    
    def find_address_line(lines, address):
        """์ฃผ์†Œ ์‹๋ณ„ ๊ทœ์น™"""
        # ์‹œ์ž‘์ : NO., LOT, JALAN ๋“ฑ
        # ์ข…๋ฃŒ์ : MALAYSIA, DARUL EHSAN
        # ์ค‘๋‹จ์ : TEL, FAX, EMAIL
    
    def find_date_line(lines, target_date):
        """๋‚ ์งœ ์‹๋ณ„ ๊ทœ์น™"""
        # ๋‚ ์งœ ํฌ๋งท ํŒจํ„ด ๋งค์นญ
        
    def find_total_line(lines, total):
        """์ด์•ก ์‹๋ณ„ ๊ทœ์น™"""
        # TOTAL, AMOUNT ํ‚ค์›Œ๋“œ
        # ์ˆซ์ž ํฌ๋งท ๊ฒ€์ฆ

4. Hugging Face ๋ฐ์ดํ„ฐ์…‹ ๋ณ€ํ™˜

def convert_to_hf_dataset():
    # ๋ฐ์ดํ„ฐ์…‹ ํŠน์„ฑ ์ •์˜
    features = Features({
        'image': Image(),
        'label': Sequence(...),
        'words': Sequence(...),
        'bbox': Array2D(...),
    })
    
    # Dataset ๊ฐ์ฒด ์ƒ์„ฑ
    train_dataset = Dataset.from_dict(...)
    eval_dataset = Dataset.from_dict(...)
    
    # Hugging Face Hub ์—…๋กœ๋“œ
    dataset.push_to_hub("Dongwookss/SROIE")

model_training_main.ipynb

๋ชจ๋ธ ํ•™์Šต ๋ฐ ์ถ”๋ก  ๊ณผ์ •

1. ๋ฐ์ดํ„ฐ ๋ฐ ๋ชจ๋ธ ์ค€๋น„

# ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ
dataset = load_dataset("Dongwookss/SROIE_lb1")
processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base")

# ๋ผ๋ฒจ ์ •์˜
label_list = ["S-COMPANY", "S-DATE", "S-ADDRESS", "S-TOTAL", "O"]
id2label = {k: v for k,v in enumerate(label_list)}

2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

def prepare_examples(examples, window_size=384, stride=192):
    """Sliding Window ์ ์šฉ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ"""
    
    # ์ž…๋ ฅ ์ฒ˜๋ฆฌ
    - ์ด๋ฏธ์ง€ ํฌ๋งท ๋ณ€ํ™˜ (RGB)
    - ํ…์ŠคํŠธ ํ† ํฐํ™”
    - ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ •๊ทœํ™”
    
    # Sliding Window ์ ์šฉ
    - window_size=384๋กœ ๋ฌธ์„œ ๋ถ„ํ• 
    - stride=192๋กœ ์ค‘์ฒฉ ์˜์—ญ ์„ค์ •
    - ํ† ํฐํ™” ๋ฐ ํŒจ๋”ฉ ์ฒ˜๋ฆฌ
    
    # ๊ฒฐ๊ณผ ํฌ๋งท
    - pixel_values
    - input_ids
    - attention_mask
    - bbox
    - labels

3. ๋ชจ๋ธ ํ•™์Šต

# ํ•™์Šต ์„ค์ •
training_args = TrainingArguments(
    output_dir="test",
    max_steps=1500,
    per_device_train_batch_size=2,
    learning_rate=2e-5,
    evaluation_strategy="steps",
    eval_steps=100,
    metric_for_best_model="f1"
)

# ํ‰๊ฐ€ ๋ฉ”ํŠธ๋ฆญ
def compute_metrics(p):
    """seqeval ๊ธฐ๋ฐ˜ ์„ฑ๋Šฅ ํ‰๊ฐ€"""
    - precision
    - recall
    - f1
    - accuracy

# ๋ชจ๋ธ ํ•™์Šต
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics
)

trainer.train()

4. ์ถ”๋ก  ๋ฐ ๊ฒฐ๊ณผ ์ฒ˜๋ฆฌ

def process_with_sliding_window(example, model, processor):
    """Sliding Window ๊ธฐ๋ฐ˜ ์ถ”๋ก """
    
    # ์œˆ๋„์šฐ ์ฒ˜๋ฆฌ
    def safe_process_window():
        # ๋‹จ์ผ ์œˆ๋„์šฐ ์ฒ˜๋ฆฌ
        # word_ids ์ถ”์ 
        # ์˜ˆ์ธก ๊ฒฐ๊ณผ ์ €์žฅ
    
    # ์ „์ฒด ๋ฌธ์„œ ์ฒ˜๋ฆฌ
    - ์ฒซ ๋ฒˆ์งธ ์œˆ๋„์šฐ ์ฒ˜๋ฆฌ
    - ๋ฏธ์ฒ˜๋ฆฌ ๋‹จ์–ด ํ™•์ธ
    - ๋‘ ๋ฒˆ์งธ ์œˆ๋„์šฐ ์ฒ˜๋ฆฌ
    - ๊ฒฐ๊ณผ ๋ณ‘ํ•ฉ
    
    # ํ›„์ฒ˜๋ฆฌ
    - ์˜ˆ์ธก ๋ผ๋ฒจ ์ •๋ฆฌ
    - ๊ฒฐ๊ณผ ํฌ๋งทํŒ…
    - CSV ํŒŒ์ผ ์ €์žฅ

5. ๋ชจ๋ธ ์ €์žฅ ๋ฐ ๋ฐฐํฌ

def save_and_upload():
    """๋ชจ๋ธ ์ €์žฅ ๋ฐ Hugging Face Hub ์—…๋กœ๋“œ"""
    
    # ๋ชจ๋ธ ์ €์žฅ
    trainer.save_model()
    
    # Hugging Face Hub ์—…๋กœ๋“œ
    upload_folder_to_huggingface(
        folder_path="test/checkpoint-1500",
        repo_id="Dongwookss/vie_task",
        token=HF_TOKEN
    )

4. Approach

4.1 ์ดˆ๊ธฐ ๋ถ„์„ ๋ฐ ๋ชจ๋ธ ์„ ์ •

  1. ๋ฐ์ดํ„ฐ ๋ถ„์„

    • ์ข‹์€ ๋ชจ๋ธ์„ ์œ„ํ•ด์„œ๋Š” ์ข‹์€ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • txt ํŒŒ์ผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ•˜๊ณ  ์ง์ ‘ ๋ณด๋ฉฐ label์ด ์ž˜ ๋˜์–ด์žˆ๋Š”์ง€ ํ™•์ธํ•˜์˜€์œผ๋‚˜
      ์ดˆ๊ธฐ์— ๋ฐœ๊ฒฌ๋ณด๋‹ค ๋‚˜์ค‘์— ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฌธ์ œ์ ์„ ๋ฐœ๊ฒฌํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
    • ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ๋ฐ ํ’ˆ์งˆ ๋ถ„์„ ์ง„ํ–‰ - look_data.ipynb๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ง์ ‘ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ํ™•์ธ
      • ๋ฐ์ดํ„ฐ label์˜ ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์€ ์ ์„ ํ™•์ธ
    • ๋ผ๋ฒจ๋ง ํŒจํ„ด ๋ถ„์„
  2. ๋ชจ๋ธ ์„ ์ •

ํŠน์„ฑ LayoutLM LayoutLMv2 LayoutLMv3
๊ธฐ๋ณธ ๊ตฌ์กฐ BERT ๊ธฐ๋ฐ˜ + 2D ์œ„์น˜ ์ž„๋ฒ ๋”ฉ LayoutLM + ์‹œ๊ฐ์  ์ž„๋ฒ ๋”ฉ Transformer ๊ธฐ๋ฐ˜ ํ†ตํ•ฉ ์•„ํ‚คํ…์ฒ˜
์ฃผ์š” ํŠน์ง• - ํ…์ŠคํŠธ์™€ ๋ ˆ์ด์•„์›ƒ ์ •๋ณด ํ†ตํ•ฉ
- ๋‹จ์ˆœํ•œ ๊ตฌ์กฐ
- ์‹œ๊ฐ์  ๋ฐฑ๋ณธ ๋„์ž…
- ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ์ •๋ ฌ
- ๋‹จ์ผ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์ธ์ฝ”๋”
- WPA(Word-Patch Alignment)
์žฅ์  - ํ•™์Šต ํšจ์œจ์„ฑ ๋†’์Œ
- ๋น ๋ฅธ ์ถ”๋ก  ์†๋„
- ์ด๋ฏธ์ง€ ํŠน์ง• ํ™œ์šฉ
- ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ
- ํšจ์œจ์ ์ธ ํ†ตํ•ฉ ์ฒ˜๋ฆฌ
- ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ
- ๊ณ„์‚ฐ ์ž์› ํšจ์œจ์„ฑ
ํ•œ๊ณ„์  - ์ด๋ฏธ์ง€ ํŠน์ง• ๋ฏธํ™œ์šฉ
- ์ œํ•œ์  ๋ชจ๋ธ๋ง
- ๋ณต์žกํ•œ ๊ตฌ์กฐ
- ๋†’์€ ํ•™์Šต ๋น„์šฉ
- ํฐ ๋ชจ๋ธ ํฌ๊ธฐ
- ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ๋Ÿ‰
์„ฑ๋Šฅ Form Understanding
FUNSD: 79.3%
Form Understanding
FUNSD: 82.8%
Form Understanding
FUNSD: 85.4%
์„ ์ • ์—ฌ๋ถ€ โŒ โŒ โœ…
์„ ์ • ์ด์œ  - - - ๋‹จ์ผ ์ธ์ฝ”๋” ํšจ์œจ์„ฑ
- ํ–ฅ์ƒ๋œ ์ •๋ณด ํ†ตํ•ฉ
- ์ตœ์‹  ์‚ฌ์ „ํ•™์Šต ๊ธฐ๋ฒ•
- SOTA ์„ฑ๋Šฅ
  1. Fine-tuning์˜ ํ•„์š”์„ฑ
ํŠน์„ฑ LayoutLMv3 (base) LayoutLMv3 (fine-tuned)
์„ฑ๋Šฅ F1: 9.64
EM: 0.00
EM_no_space: 0.00
F1: 84.96
EM: 50.43
EM_no_space: 50.43
๋ถ„์„ - ์‚ฌ์ „ํ•™์Šต๋งŒ ๋œ ์ƒํƒœ
- SROIE ํƒœ์Šคํฌ์— ๋Œ€ํ•œ ํ›ˆ๋ จ ์—†์Œ
- Zero-shot ์„ฑ๋Šฅ ๋‚ฎ์Œ
- SROIE ๋ฐ์ดํ„ฐ๋กœ Fine-tuning
- ํƒœ์Šคํฌ์— ํŠนํ™”๋œ ํ•™์Šต ์™„๋ฃŒ
- ๋†’์€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ

Base ๋ชจ๋ธ๊ณผ Fine-tuned ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ์ฐจ์ด๋กœ ๋„๋ฉ”์ธ ํŠนํ™” ํ•™์Šต์˜ ์ค‘์š”์„ฑ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
์‚ฌ์ „ํ•™์Šต๋œ LayoutLMv3 base ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์ธ ๋ฌธ์„œ ์ดํ•ด ๋Šฅ๋ ฅ์„ ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋‚˜, ํŠน์ • ๋„๋ฉ”์ธ(์˜์ˆ˜์ฆ)๊ณผ ํƒœ์Šคํฌ(์ •๋ณด ์ถ”์ถœ)์— ๋Œ€ํ•œ fine-tuning ์—†์ด๋Š” ์‹ค์šฉ์ ์ธ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

4.2 ๊ฐœ๋ฐœ ๊ณผ์ •

1) ๊ธฐ๋ณธ ๊ตฌํ˜„ (v1-v2)

training_args = TrainingArguments(
    output_dir="test",
    max_steps=1000,  # v2: 1500
    learning_rate=1e-5,  # v2: 2e-5
    evaluation_strategy="steps",
    eval_steps=100,
    load_best_model_at_end=True,
    metric_for_best_model="f1"
)
  • ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ๋ฐ ๊ธฐ๋ณธ ํ•™์Šต
  • ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” ์ง„ํ–‰

2) Sliding Window ๋„์ž… (v3-v5)

  • ๋ฌธ์ œ ๋ฐœ๊ฒฌ

    • v3
      • output.csv ์™€ input ๋ฐ์ดํ„ฐ๋ฅผ ๋น„๊ตํ•˜์˜€์„๋•Œ ๊ฐฏ์ˆ˜๊ฐ€ ๋งž์ง€ ์•Š๋Š”๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
      • ์›์ธ์„ ๋ถ„์„ํ•˜๋˜ ์ค‘ ํŒŒ์ผ๋ณ„ ๋‹จ์–ด ์ฒ˜๋ฆฌ๊ฐฏ์ˆ˜๋ฅผ print๋ฌธ์„ ํ†ตํ•ด ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
      • ์ด ๊ฒฐ๊ณผ, X51007846283 ์ด๋ฏธ์ง€ ๋ฐ ๋‹จ์–ด์ •๋ณด์— ๋Œ€ํ•œ ํ† ํฐ๊ฐ’์ด 512๋ฅผ ๋„˜์–ด์„œ๊ฒŒ ๋˜์–ด output์œผ๋กœ ๋‚˜์˜ค์ง€ ์•Š๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
      • Position Embedding 514 ํ† ํฐ ์ œํ•œ ์ด์Šˆ
      • processor์˜ max_length = 2024๋กœ ์ˆ˜์ •ํ•˜์—ฌ ๋ชจ๋ธ ํ›ˆ๋ จ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • v4
      • max_lenght๋ฅผ ๋†’์—ฌ๋„ ๋‹จ์–ด์˜ ํ† ํฐ์ˆ˜๊ฐ€ ์ผ์ • ์ด์ƒ ์˜ฌ๋ผ๊ฐ€๋ฉด ์ธ์‹์ด ์•ˆ๋˜๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
      • position embedding์— ์žˆ์–ด์„œ๋„ ํ† ํฐ์ˆ˜๊ฐ€ ์ œํ•œ๋˜์–ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ƒˆ๋กœ์šด ๊ธฐ๋ฒ•์„ ์ ์šฉ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
      • v3 ๊ฒฐ๊ณผ ๋ชจ๋ธ์— inference ๋ฅผ ์ง„ํ–‰ํ•˜๋ฉฐ sliding_window๊ธฐ๋ฒ•์„ ์ ์šฉ์‹œํ‚จ ๊ฒฐ๊ณผ ๊ฒฐ๊ณผ๋ฌผ ํ‰๊ฐ€์ง€ํ‘œ๊ฐ€ ํ–ฅ์ƒ๋จ์„ ๋ฐœ๊ฒฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
      • v3 f1 : 80.4974 -> v4+sliding_window f1 : 84.9540
      • Sliding window ๊ธฐ๋ฒ• ์ ์šฉ์„ ์œ„ํ•ด ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ๋ถ€ํ„ฐ ์ ์šฉ์„ ๋ชฉํ‘œ๋กœ v5๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • v5
      • Dataset์˜ train๊ณผ test set์— ๋Œ€ํ•ด์„œ๋„ sliding_window๋ฅผ ์ ์šฉ์‹œ์ผœ ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Sliding Window ๊ตฌํ˜„

๊ตฌ๋ถ„ ์ ์šฉ ์ „ ์ ์šฉ ํ›„
์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๋‹จ์ผ ํŒจ์Šค๋กœ ์ „์ฒด ๋ฌธ์„œ ์ฒ˜๋ฆฌ ์ค‘์ฒฉ ์œˆ๋„์šฐ๋กœ ๋ถ„ํ•  ์ฒ˜๋ฆฌ
window_size=384, stride=192
์žฅ์  - ๊ตฌํ˜„ ๋‹จ์ˆœ
- ๋ฌธ๋งฅ ์œ ์ง€ ์šฉ์ด
- ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ 
- ๊ธด ๋ฌธ์„œ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
- ํ† ํฐ ์†์‹ค ๋ฐฉ์ง€
- Position Embedding ์ œํ•œ ๊ทน๋ณต
๋‹จ์  - ๊ธด ๋ฌธ์„œ ์ ˆ๋‹จ
- Position Embedding ์ œํ•œ
- ํ† ํฐ ์†์‹ค ๋ฐœ์ƒ
- ๊ตฌํ˜„ ๋ณต์žก๋„ ์ฆ๊ฐ€
- ์ค‘๋ณต ์ฒ˜๋ฆฌ ํ•„์š”
- ๊ฒฝ๊ณ„ ๋ถ€๋ถ„ ๋ฌธ๋งฅ ์œ ์‹ค ๊ฐ€๋Šฅ์„ฑ
์„ฑ๋Šฅ F1: 80.4974
EM: 49.7839
F1: 84.9650
EM: 50.4323
๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ๋‚ฎ์Œ ์ค‘๋ณต ์ฒ˜๋ฆฌ๋กœ ์ธํ•œ ์ฆ๊ฐ€
์ฒ˜๋ฆฌ ์†๋„ ๋น ๋ฆ„ ์ค‘๋ณต ์˜์—ญ ์ฒ˜๋ฆฌ๋กœ ์ธํ•œ ์ง€์—ฐ
ํ™œ์šฉ ์‚ฌ๋ก€ - ์งง์€ ๋ฌธ์„œ
- ๋‹จ์ˆœํ•œ ๋ ˆ์ด์•„์›ƒ
- ๊ธด ๋ฌธ์„œ
- ๋ณต์žกํ•œ ๋ ˆ์ด์•„์›ƒ
- ์ •๋ฐ€ํ•œ ์ •๋ณด ์ถ”์ถœ ํ•„์š” ์‹œ
  • ๊ตฌํ˜„ ์„ธ๋ถ€์‚ฌํ•ญ
    # Sliding Window ํŒŒ๋ผ๋ฏธํ„ฐ
    window_size = 384  # ๋‹จ์ผ ์œˆ๋„์šฐ ํฌ๊ธฐ
    stride = 192      # ์œˆ๋„์šฐ ์ด๋™ ๊ฐ„๊ฒฉ
    overlap = 192     # ์ค‘์ฒฉ ์˜์—ญ
    
    # window ์ฒ˜๋ฆฌ ๋กœ์ง
    def process_with_sliding_window(text):
        windows = []
        for i in range(0, len(text), stride):
            window = text[i:i + window_size]
            windows.append(window)
        return windows 
    

3) ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„  (v6)

  • ๋ผ๋ฒจ๋ง ๊ฐœ์„ 
    • entities ์ •๋ณด ํ™œ์šฉํ•˜์—ฌ label ์ˆ˜์ •
    • OCR ๊ธ€์ž์ธ์‹์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ label์ด ์†ํ•œ bbox๋ฅผ ์ถ”์ถœํ•˜๊ณ  ํ•ด๋‹น ๋ฌธ์žฅ ํ˜น์€ ๋‹จ์–ด๊ฐ€ ์†ํ•œ ๋ฐ”์šด๋”ฉ๋ฐ•์Šค ๋‚ด ๋‹จ์–ด์— ๋Œ€ํ•ด์„œ ๋ผ๋ฒจ ์žฌ๋ถ€์—ฌ

4.3 ์„ฑ๋Šฅ ์ตœ์ ํ™”

training_args = TrainingArguments(
    output_dir="test",
    max_steps=1500,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    learning_rate=2e-5,
    gradient_accumulation_steps=4
)

5. ๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ๊ณผ์ œ

5.1 ์ฃผ์š” ์„ฑ๊ณผ

  1. ๋ชจ๋ธ ๊ฒฐ๊ณผ ๋ฐ ์„ฑ๋Šฅ

    • ์ œ์ถœ๋ชจ๋ธ : v5
    • F1 ์ ์ˆ˜: 84.9650
    • ์ •ํ™•๋„(EM): 50.4323
  2. ๊ธฐ์ˆ ์  ์„ฑ๊ณผ

    • Sliding Window ๊ธฐ๋ฒ• ์„ฑ๊ณต์  ๊ตฌํ˜„
    • ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„  ๋ฐฉ๋ฒ•๋ก  ํ™•๋ฆฝ

5.2 ํ•œ๊ณ„์ 

  1. ๊ธด ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ์˜ ์ œ์•ฝ
  2. ํ•„๋“œ๋ณ„ ์„ฑ๋Šฅ ํŽธ์ฐจ๋กœ ์ธํ•œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ณผ์ • ํ•„์š”

5.3 ํ–ฅํ›„ ๊ฐœ์„  ๋ฐฉํ–ฅ

  1. ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ณ ๋„ํ™”

    • ์˜ˆ์™ธ์ฒ˜๋ฆฌ ํ˜น ๊ทœ์น™๊ธฐ๋ฐ˜ ์™ธ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก  ์ ์šฉ์ด ํ•„์š”ํ•จ.
  2. ๋ชจ๋ธ ์ตœ์ ํ™”

    • Sliding Window ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ†ตํ•œ ์ผ๋ถ€ ์„ฑ๋Šฅ๊ฐœ์„ 
    • ํ•„๋“œ๋ณ„ ํŠนํ™” ๋ชจ๋ธ์„ ํ†ตํ•œ ์›ํ•˜๋Š” label์— ๋Œ€ํ•œ quality๊ฐ€ ๋ณด์žฅ๋˜๋Š” ํ›ˆ๋ จ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•
  3. ์‹œ์Šคํ…œ ์•ˆ์ •์„ฑ

    • ํ•„๋“œ๋ณ„ ๊ฒ€์ฆ ๊ทœ์น™ ์ฒด๊ณ„ํ™”

About

OCR mini_proj by layoutLMv3

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published