[LangChain/node.js] LangChain을 활용한 문서 요약(loadSummarizationChain)

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

자라나라 개발머리

[LangChain/node.js] LangChain을 활용한 문서 요약(loadSummarizationChain) 본문

AI&Data

[LangChain/node.js] LangChain을 활용한 문서 요약(loadSummarizationChain)

iammindy 2024. 5. 25. 15:11

오늘은 저번 Langchain의 쌩기초를 학습한 것을 토대로 응용해보려고 합니다.

마침 저저번에 학습했던 map-reduce 구조를 활용하여 문서 요약을 구현한 chain이 있더라구요.

langchain의 쌩기초와, mapReduce의 심화 설명이 필요하신 분은 아래 두 게시글도 추천드려요.

[LangChain] LangChain 쌩기초 뜯어보기 (장점, LECL)

오늘은 핫!한 LLM 프레임워크인 Langchain에 대해서 뜯어보려고 합니다. LangChain 이란?대규모 언어 모델(LLM)을 기반으로하는 애플리케이션을 개발하기 위한 프레임워크(langchain 공식문서),LLM을 사용

growupdevmind.tistory.com

[데이터 분산 처리] 맵리듀스(MapReduce) 간단 설명/ Mapper, Reducer 구현해보기

맵리듀스(MapReduce)는 데이터 분산 처리에 활용되는 프로그래밍 모델로서, 대량의 데이터를 세분화해서 각 머신에서 로직을 처리하고, 다시 합쳐 효율적으로 데이터 처리를 할 수 있도록하는 모

growupdevmind.tistory.com

본론으로 들어가서, 오늘은 langchain의 loadSummarizationChain을 활용하여 문서 요약을 구현해보는 시간을 가져보도록 하겠습니다!

loadSummarizationChain은 말 그대로 llm을 통해 문서 요약을 도와주는 chain인데요.

llm에서 문서 요약이 중요한 이유는, llm 요청의 token 수의 제한이 있기 때문입니다.

따라서 아주아주 긴 문서의 내용을 담는 요청을 보낼 경우 이를 분산해서 요청을 보내거나, 축약해서 보내는 전략이 필요해요.

langchain에서는 4가지 방법으로 Document Summarization 기능을 제공하는데요, 오늘은 그 중에서도 mapReduce 방법에 대해 알아보려고 합니다. (다른 방법에 대해서 궁금하신 분은 게시글 맨 아래 잘 설명해주신 분의 블로그 글을 링크해두겠습니다!)

mapReduce

요것이 바로 map-reduce 기반으로 문서요약을 하는 시스템 흐름입니다. 이 구조도를 중심으로, mapReduce과정을 설명해볼게요.

1. map

하나의 큰 문서를 잘게 쪼개서, 각각 llm에 요청을 보냅니다. 잘게 쪼개는 건 사전 작업에서 이뤄져야해요!

구조도를 잘 보시면 4개가 겹쳐있죠? 4개의 다른 문서가 같은 prompt로 model에 요청을 보낸다는 의미입니다. 같은 과정이 4번 이뤄진다는 것이죠.

map의 prompt는 "Extract all feature requests from commet: i think that..." 이네요.

그럼 LLM은 해당 문서에서 i think that으로 시작하는 특징적인 comment만 추출해서 반환을 해주겠죠?

2. reduce

그럼 이렇게 각각의 모델에서 4개의 응답이 올텐데, 우리는 reduce 과정을 통해 요 4개의 문서를 하나의 문서로 합칩니다.

요렇게 합쳐진 하나의 문서를 가지고 최종으로 모델에게 요청을 보내게 됩니다.

여기서는 "What are the top feature request: Add foo... Add bar..." 요 prompt를 쓰게되네요.

이렇게 모델이 응답이 오면, 그것이 최종 응답이 됩니다.

생각보다 쉽죠? 다시 간단히 정리해서 설명하면,

1. 아주아주 긴 문서를 n개로 나누고,

2. map: 각 문서를 특정 prompt로 llm에 요청을 보내고

3. reduce: 응답을 한 문서로 합쳐서 최종 prompt로 다시 llm에 요청을 보내서 응답을 받습니다.

요게 map-reduce 요약의 전부입니다!

그럼 langchain을 통해 어떻게 코드로 구현하는지 확인해볼까요?

loadSummarizationChain

import { OpenAI } from "@langchain/openai";
import { loadSummarizationChain } from "langchain/chains";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import * as fs from "fs";

const text = fs.readFileSync("state_of_the_union.txt", "utf8"); //txt 파일 불러오기
const model = new OpenAI({ temperature: 0 }); //모델 초기화
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 }); //문서를 나누기 위한 spilitter 초기화
const docs = await textSplitter.createDocuments([text]); //문서 쪼개기

const chain = loadSummarizationChain(model, { type: "map_reduce" }); //mapReduce방법을 사용하는 loadSummarizationChain 초기화

const res = await chain.invoke({
  input_documents: docs,
}); //chain 사용

console.log({ res });

정말 쉽죠! langchain이 이미 만들어놓은 chain을 활용하니, mapReduce의 원리를 알 필요도 없이 기능만 딱 사용할 수 있습니다. 이게 바로 랭체인의 장점 중 하나인, Off-the-shelf chains입니다. 이미 만들어진 체인을 갖다 쓸 수 있다는 뜻 입니다.

하지만 저희 여기서 그칠 수 없지 않습니까? 이 원리를 이해해야 하지 않겠습니까?

그래서 이 loadSummarizationChain의 코드를 뜯어보도록 하겠습니다.

export const loadSummarizationChain = (
  llm: BaseLanguageModelInterface,
  params: SummarizationChainParams = { type: "map_reduce" }
) => {
  const { verbose } = params;
  //중략
  if (params.type === "map_reduce") {
    const {
      combineMapPrompt = DEFAULT_PROMPT,
      combinePrompt = DEFAULT_PROMPT,
      combineLLM,
      returnIntermediateSteps,
    } = params;
    const llmChain = new LLMChain({ prompt: combineMapPrompt, llm, verbose });
    const combineLLMChain = new LLMChain({
      prompt: combinePrompt,
      llm: combineLLM ?? llm,
      verbose,
    });
    const combineDocumentChain = new StuffDocumentsChain({
      llmChain: combineLLMChain,
      documentVariableName: "text",
      verbose,
    });
    const chain = new MapReduceDocumentsChain({
      llmChain,
      combineDocumentChain,
      documentVariableName: "text",
      returnIntermediateSteps,
      verbose,
    });
    return chain;
  }
  //중략
  throw new Error(`Invalid _type: ${params.type}`);
};

자 이렇게 loadSummarizationChain class 코드를 들고 와봤습니다. mapReduce 관련한 부분만 들고 왔어요.

제일 처음 파라미터 부분을 볼게요.

export const loadSummarizationChain = (
  llm: BaseLanguageModelInterface,
  params: SummarizationChainParams = { type: "map_reduce" }
)

저희가 아까 체인을 사용한 걸 보면,

const chain = loadSummarizationChain(model, { type: "map_reduce" });

이렇게 사용을 했는데요. 그게 바로 파라미터로 들어있네요! 그리고 type을 따로 지정해주지 않아도 map_reduce가 default로 정해져있는 것도 볼 수 있네요.

if (params.type === "map_reduce") {
    const {
      combineMapPrompt = DEFAULT_PROMPT,
      combinePrompt = DEFAULT_PROMPT,
      combineLLM,
      returnIntermediateSteps,
    } = params;
    const llmChain = new LLMChain({ prompt: combineMapPrompt, llm, verbose });
    const combineLLMChain = new LLMChain({
      prompt: combinePrompt,
      llm: combineLLM ?? llm,
      verbose,
    });
    const combineDocumentChain = new StuffDocumentsChain({
      llmChain: combineLLMChain,
      documentVariableName: "text",
      verbose,
    });
    const chain = new MapReduceDocumentsChain({
      llmChain,
      combineDocumentChain,
      documentVariableName: "text",
      returnIntermediateSteps,
      verbose,
    });
    return chain;
  }

type이 map_reduce일 때 실행 구조 입니다. 여기가 바로 핵심입니다.

chain을 총 4개를 선언하고 있습니다.

1. llmChain: map과정을 수행하는 chain

2. combineLLMChian, combineDocumentChain: reduce 과정을 수행하는 chain

3. chain: map-reduce과정을 수행해주는 chain

이렇게 구분 할 수 있습니다.

1. llmChain

 const llmChain = new LLMChain({ prompt: combineMapPrompt, llm, verbose });

기본적인 chain입니다. map과정에 필요한 prompt와 model을 넣어서 chain을 하나 만들었네요.

저 verbose는 세세한 stacktrace를 보고 싶을 때 쓰는 파라미터입니다. verbose = true할 시에 langchain이 굴러가는 과정을 로그로 찍어줍니다.

2. combineLLMChian, combineDocumentChain

const combineLLMChain = new LLMChain({
      prompt: combinePrompt,
      llm: combineLLM ?? llm,
      verbose,
    });
const combineDocumentChain = new StuffDocumentsChain({
      llmChain: combineLLMChain,
      documentVariableName: "text",
      verbose,
    });

combineLLMChain은 llmChain과 똑같이 reduce과정에 필요한 prompt와 model을 넣은 chain입니다.

그런데, 요 chian을 다시 StuffDcoumentsChain에 파라미터로 넘깁니다.

StuffDocumentChain이 뭘까요?

StuffDocumentChain은 각각의 문서들을 결합해서 모델에 전달할 수 있게 하는 친굽니다.

input으로 받은 모든 문서들을 하나로 결합해서 combineLLMChain에 문서를 전달하는거죠.

3.chain

const chain = new MapReduceDocumentsChain({
      llmChain,
      combineDocumentChain,
      documentVariableName: "text",
      returnIntermediateSteps,
      verbose,
    });

그리고 마지막으로, 대망의 mapReduceDocumentChain에 각각의 단계에서 맞춰 생성된 chain을 파라미터로 넘겨줍니다. 여기서도 documentVariableName을 넘겨주네요. 여기서 returnIntermediateSteps는, 각각의 단계에서 실행된 결과가 output에 같이 출력되는 파라미터입니다.

이렇게 하면, mapReduceDocumentChain이 전 과정을 통솔해주며 mapReduce 과정을 진행해준답니다. 그 안의 코드가 궁금하신 분은 아래 페이지를 참고해주세요. 😁

langchainjs/langchain/src/chains/combine_docs_chain.ts at 238e093b92a7f26f74937c0ba78c1d69c542891c · langchain-ai/langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗. Contribute to langchain-ai/langchainjs development by creating an account on GitHub.

github.com

이렇게, 오늘은 langchain의 loadSummarizationChain을 활용하여 문서 요약을 진행해봤습니다.

langchain의 엄청난 활용성을 잠시 엿봤던 시간이었던 것 같습니다.

langchain이 제공하고 있는 chain을 활용하여 내가 원하는 기능을 쉽게 구현하거나, 아니면 제공된 chain을 활용하여 나만의 custom chain도 만들 수 있겠구나 싶었습니다.

또, 모델만 바꿔주면 어떤 모델에서도 내가 만든 custom chain을 활용할 수 있다는 점이, 아 이래서 LLM 모델의 애플리케이션화를 도와주는 강력한 프레임워크구나! 싶었습니다. langchain의 용도를 비로소 깨닫게 되었습니다.

마치도록 하겠습니다. 읽어주셔서 감사합니다.😊

reference

https://js.langchain.com/v0.1/docs/modules/chains/popular/summarize/

https://js.langchain.com/v0.1/docs/modules/chains/document/map_reduce/

https://v02.api.js.langchain.com/functions/langchain_chains.loadSummarizationChain.html

https://jiniai.biz/?p=2948 (langchain 문서 처리 전략 게시글)

LangChain의 문서 처리 전략: 스터핑, 맵리듀스, 리파인, 맵 리랭크 – Jini AI

LangChain Documents LangChain Summarization Use case LLM 어플리케이션을 위한 Chunking Strategies 문서를 여러 부분으로 나누는 과정을 청킹이라고 합니다, 이렇게 하면 원하는 LLM의 토큰 한도 내에 맞출 수 있습

jiniai.biz

'AI&Data' 카테고리의 다른 글

[LangChain] LangChain 쌩기초 뜯어보기 (장점, LECL) (0)	2024.05.12
[데이터 분산 처리] 맵리듀스(MapReduce) 간단 설명/ Mapper, Reducer 구현해보기 (2)	2024.04.28

'AI&Data' Related Articles

자라나라 개발머리

[LangChain/node.js] LangChain을 활용한 문서 요약(loadSummarizationChain) 본문

[LangChain/node.js] LangChain을 활용한 문서 요약(loadSummarizationChain)

mapReduce

loadSummarizationChain

1. llmChain

2. combineLLMChian, combineDocumentChain

reference

'AI&Data' 카테고리의 다른 글

티스토리툴바