AI - Self Learning

Notes on building LLM Application

I have been trying to learn the machine learning to develop AI applications. I was confused on where to start.

Documenting few basics of AI

Embedding models play an important role in deciding the quality of your machine learning application.

https://medium.com/@nay1228/embedding-models-a-comprehensive-guide-for-beginners-to-experts-0cfc11d449f1

Starting point

https://python.langchain.com/docs/tutorials/llm_chain/

CHAT Models

The easiest step in learning to build a LLM application is how to integrate with LLM providers like OpenAI, Anthropic using LangChain ChatModels. Chatmodels provide an interface specific to an provider that accepts messages and returns responses in the structured format

Caching

Caching is a technique to limit the cost of the LLM provider. If there are Qs that are frequently asked by the user, then the answer can be stored in the cache.

Promt Templates

Promt templates are a way of instructing the model using the user input and a set of input parameters. Models(like OpenAI) creates output based on the prompt provided by the application

RAG Application

Why RAG :

That is because the LLM models have not seen the private data althought they have been trained with public data.LLM to external data is the need of the hour.

Document : A set of information pieces that is fed to the LLM model. These documents are chunked in smaller pieces. These chunks are created because there is a limit for LLM models to process huge amount of data at one time. Broken down chunks are also called as documents

Context Window : The maximum amount of text a LLM can process

Embeddings : To ensure easy retrieval of information, each of these chunks are vectorized or converted to a numerical representation in 3 dimensional space. These embeddings are stored in a Vector database and they represent the semantic meaning of the chunk.

https://python.langchain.com/docs/tutorials/rag/

It follows the below steps

Indexing : Building a Pipeline to load the data from the source and indexing it
Retrieval and Generation : The chain of actions that llm acts on using the indexed data

Indexing

Load the data from the source using the DocumentLoaders
Use the text splitters to split the document into chunks. These chunks will beused to index the data and fed into the LLM
Store the split documents into the VectorStore and Embeddings model

Retrieval

Use the retriever , based on the user input to retrieve the relevant chunks
Generate : Use the LLM model produces the response that has both the question and retrieved dataRetri

Details of Retrieval and Generation

Any LLM application would require a graph to be built that keeps account of different states that the application goes through

To define the graph, we need following

Define the state of the application
Nodes or the application steps
Control flow of the application

State

Controls what data is input to the application, data transferred between the steps and response

from langchain_core.documents import Document

from typing_extensions import List, TypedDict

class State(TypedDict):

question: str

context: List[Document]

answer: str

Application Steps

Sequence of steps that a LLM application goes through for example

def retrieve(state: State):

retrieved_docs = vector_store.similarity_search(state["question"])

return {"context": retrieved_docs}

def generate(state: State):

docs_content = "\n\n".join(doc.page_content for doc in state["context"])

messages = prompt.invoke({"question": state["question"], "context": docs_content})

response = llm.invoke(messages)

return {"answer": response.content}

retrieval step simply runs a similarity search using the input question
generate simply formats retrieved context and original question into a prompt for the chat model

Control Flow

we compile our application into a single graph object. In this case, we are just connecting the retrieval and generation steps into a single sequence.

from langgraph.graph import START, StateGraph

graph_builder = StateGraph(State).add_sequence([retrieve, generate])

graph_builder.add_edge(START, "retrieve")

graph = graph_builder.compile()

Testing the Application

Test the application by executing the graph object

result = graph.invoke({"question": "What is Task Decomposition?"})

print(f'Context: {result["context"]}\n\n')

print(f'Answer: {result["answer"]}')

Page updated

Google Sites

Report abuse