I have created my first Agentic AI more than two years ago. It is not some new technology, but simply an approach to software development using LLM (GPT and similar). You don't need any frameworks or specific AI knowledge for this, just being a programmer. From this article you will understand how to design agents and what tasks they are suitable for.
It's all based on two abilities of neural networks:
LLMs (not all) can return JSON, they are additionally trained for this
Programmers (not all) can decompose tasks
Basic Capabilities
Let's say you're creating a very ordinary program, where you hardcode all the logic as usual. You use an API to which you can send any text, and in response you get JSON in a fixed format. Congratulations, your program is an agentic AI (in the sense of the current hype). AI does not control the logic and does not make calls to databases. Your code receives JSON from AI, and then everything is the old-fashioned way: you write IF-THEN-ELSE to access databases, to call the API again, etc.
Let's start with a very simple example. A user has sent a comment and we need to check it for insult:
//API request, just a text string:
User sent me a comment
Tell me if it contains hidden or obvious insult
Send a response in JSON in the following format
{
"UserComment":"This is all some kind of nonsense",
"YourExplanation":"add your explanations to your answer here"
"InsultDetected":true/false
}
//The response from the API will be as follows:
{
"UserComment":"This is all some kind of nonsense",
"YourExplanation":"This text does not contain insults",
"InsultDetected":false
}
In the response you will get the attribute "InsultDetected", which is very convenient to use in your code. Ignore the attribute "YourExplanation", I added it just to improve the quality of the answers. Since LLM answers intuitively (not logically) you should be prepared for the following problems:
The JSON in the response may be broken, especially if you missed an important comma or quote in the request, as I did in the example above
The InsultDetected attribute can be either true or false for the same text. The more accurately you define what Insult is in the request the less often the AI will make mistakes.
Multiple Steps
Let's move on to a more complex example. Users of my application learn English words, they have an option to add any word with a certain meaning to the training. The tasks of the AI agent are:
To find all words with suitable meanings in the database
If nothing is found, then generate new learning content
Let's say the user sent a request that they want to learn the word "milk" with the meaning "exploit or defraud someone". The First step is obvious, we extract from the database all records where Word='milk', there will be several of them. The Second step, for the convenience of the user we filter the records using LLM:
Tell me what meanings are suitable for the meaning "exploit or defraud someone"
Return JSON:
{
"EnglishWord":"milk",
"Meaning":"exploit or defraud someone",
"OtherMeanings":
[
{"Meaning":"Whyte liquid", "IsSameMeaning":true/false},
{"Meaning":"To milk a cow", "IsSameMeaning":true/false},
]
}
If there is something with "IsSameMeaning":true in the output array "OtherMeanings" then we simply return the result to the user. If nothing was found then the Third step: send another request to LLM asking it to generate new content, again wrapped in JSON for reliability. The result will be saved in the database for future reuse.
Other Scenarios
A typical scenario for an agentic AI is a support service. The user sends a description of the problem in the chat and ideally the problem should be solved without involving people. You need to show maximum imagination on how to do this based on the task and not on design patterns. For example:
Ask LLM to determine the category of the problem. Send the entire user complaint and description of your categories ("Parcel did not arrive", "Disputing the withdrawal of money"). Ask to return true/false for each category in the response
If the category "Disputing the withdrawal of funds" is determined then ask LLM to find the Transaction ID in the user's complaint. If not found then send the user the question "Send me ID"
Find the ID in the database, look for problems, roll back the transaction. All this without the participation of LLM.
You can also search for information in the local knowledge base if you have one (without LLM) or even do a web search (still without LLM) and then (optionally) feed the search results to LLM asking "Is there an answer here?". Do whatever you want in any order. Your job is to solve the user's problem, not to follow patterns.
Unnecessary Complications
It's a very trendy trend to move the logic of the program to LLM, but this is the worst thing you can do. Examples:
A user sent a request for what product they want. You send the user's request with the database structure to LLM and ask to generate an SQL query which you then execute
A user sent a request to technical support. You send the description of your API to LLM and ask it to decide which method needs to be called. You describe all the logic in natural language.
This will work but it will be "Expensive, Slow and Poor". Each call to LLM costs money and takes several seconds to execute. The simpler the requests you make to LLM the less hallucinations you get from it. I'm not even talking about debugging capabilities of such programs and security issues.
Conclusion
GPT-3.5 already could return JSON. GPT-4 does it better but nothing changed fundamentally. LLM is a great tool, you can do incredible things with it if you use your brain. AI agents are a lot of source code that programmers write and a few LLM calls.