I have been trying to learn the machine learning to develop AI applications. I was confused on where to start.
Documenting few basics of AI
Embedding models play an important role in deciding the quality of your machine learning application.
Starting point
https://python.langchain.com/docs/tutorials/llm_chain/
The easiest step in learning to build a LLM application is how to integrate with LLM providers like OpenAI, Anthropic using LangChain ChatModels. Chatmodels provide an interface specific to an provider that accepts messages and returns responses in the structured format
Caching is a technique to limit the cost of the LLM provider. If there are Qs that are frequently asked by the user, then the answer can be stored in the cache.
Promt templates are a way of instructing the model using the user input and a set of input parameters. Models(like OpenAI) creates output based on the prompt provided by the application
Why RAG :
That is because the LLM models have not seen the private data althought they have been trained with public data.LLM to external data is the need of the hour.
Document : A set of information pieces that is fed to the LLM model. These documents are chunked in smaller pieces. These chunks are created because there is a limit for LLM models to process huge amount of data at one time. Broken down chunks are also called as documents
Context Window : The maximum amount of text a LLM can process
Embeddings : To ensure easy retrieval of information, each of these chunks are vectorized or converted to a numerical representation in 3 dimensional space. These embeddings are stored in a Vector database and they represent the semantic meaning of the chunk.
https://python.langchain.com/docs/tutorials/rag/
It follows the below steps
Indexing : Building a Pipeline to load the data from the source and indexing it
Retrieval and Generation : The chain of actions that llm acts on using the indexed data
Load the data from the source using the DocumentLoaders
Use the text splitters to split the document into chunks. These chunks will beused to index the data and fed into the LLM
Store the split documents into the VectorStore and Embeddings model
Use the retriever , based on the user input to retrieve the relevant chunks
Generate : Use the LLM model produces the response that has both the question and retrieved dataRetri
Any LLM application would require a graph to be built that keeps account of different states that the application goes through
To define the graph, we need following
Define the state of the application
Nodes or the application steps
Control flow of the application
State
Controls what data is input to the application, data transferred between the steps and response
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
class State(TypedDict):
question: str
context: List[Document]
answer: str
Application Steps
Sequence of steps that a LLM application goes through for example
def retrieve(state: State):
retrieved_docs = vector_store.similarity_search(state["question"])
return {"context": retrieved_docs}
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
retrieval step simply runs a similarity search using the input question
generate simply formats retrieved context and original question into a prompt for the chat model
Control Flow
we compile our application into a single graph object. In this case, we are just connecting the retrieval and generation steps into a single sequence.
from langgraph.graph import START, StateGraph
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
Testing the Application
Test the application by executing the graph object
result = graph.invoke({"question": "What is Task Decomposition?"})
print(f'Context: {result["context"]}\n\n')
print(f'Answer: {result["answer"]}')