Video: Building With Tavily: A Live Coding Agent Demo + Open Builder Discussion | Duration: 2768s | Summary: Building With Tavily: A Live Coding Agent Demo + Open Builder Discussion | Chapters: Welcome & Introduction (25.795s), Speaker Introductions (191.41s), Journey Agent Evolution (306.32s), Coding Agent Challenges (380.83s), Agent Building Demo (606.69s), JetBrains AI Tools (1032.445s), Tavili CLI Integration (1179.205s), Building Junie Agent (1413.24s), Context Management Strategies (1688.975s), LLM Agnostic Design (1908.925s), Future Roadmap (2166.555s), Q&A and Wrap-up (2420.825s)
Transcript for "Building With Tavily: A Live Coding Agent Demo + Open Builder Discussion":
Alright. I think we are live. Hey, everyone. Hello? We'll wait a few minutes here to get started as everyone rolls in. Hey, Jonathan from New Jersey. For some context, I am in New York right now. And Nick? Yeah. I'm joining from Amsterdam. Cool. Let me share my screen here. Can everyone see that okay? Perfect. We'll wait one more minute here to begin. Awesome. Let's start. Welcome, everyone. Today, we're doing a live coding session with Tavili and Nick here from JetBrains. Super excited to have you all attend. What we're gonna talk about today is Juni, JetBrains' new coding agent, Agenza coding agent. We're gonna talk about web search with coding agents, the problems we see today, how they work together. You'll hear a lot about what Nick and I do, and we want this to be interactive. So please share any questions you have. I'll show a few demos showing the power of web search within CodingAgents. And, yeah, let's let's get started. Let's start here with some introductions. So I'm Evan. I'm from Toronto, Canada. I'm a forward deployed engineer at Teveely, and this means that I work with customers to help implement our web interaction layer into production. So through to really search, crawl, extract, research, all of our endpoints, I focus on applied engineering and figuring out the best way to use these endpoints and get them in into production and a lot of different types of systems. Nick, could you introduce yourself? Sure. Hi, Ron. I'm Nick Froloff. I'm Helen Junie, coding agent here at JetBrains. And JetBrains is you probably were is the company behind tools like Idea, PHP Storm, and Azure's IDE for coding. And here, JetBrains connecting the existing intelligence of JetBrains tools with AI. Haven't given it back to you. Hello? I think I might have cut out there with my Wi Fi there. Nick, you finished introducing yourself? Looks looks looks like that. Yeah. I I I I just did. Yeah. Awesome. I think we we should be good to go. move. to the next slide here. Do you mind. sharing a bit about. Absolutely. Yeah. So Journey is actually not new new. We started with a with a Journey agent last year. Last year, it was working primarily just in JetBrains IDs. And what happened since March, I think about months ago, we add support for multiple different surfaces across not just IDE, but common line interfaces, CICD support, a number of others. And, yeah, we also extended the support of different different models. We extended support of different providers, including OpenRelter, including, yeah, a number of others. So, yeah, I think it's a very interesting time for us to expand. And as we're gonna cover today, we also added support of web search together with Tavilli. And, yeah, if you wanna learn more about Juni, you can visit our website at juni.jetmines.com. Thank you, Nick. Moving forward, the challenge we see with coding agents today, they've seen such a large growth. Think about Cloud Code, Cursor, Codecs, now with Junie. And the issues we have with coding agents is that LLMs have training cutoffs. So if a package gets updated or if a package As time goes on, new code gets created, and we wanna make sure that these coding agents have access to the latest and best information. Additionally, official docs cover everything. So let's say you take the lane chain docs or fast API docs. You'll blow up the context window of these coding agents, and you wanna pass the coding agent the most relevant, accurate context in the least amount of chunks possible. Therefore, the coding agent can do the best work and not bloat in context, and you will save on costs limiting the input tokens to the coding agent. For example, if we're working on a task in Python, the agent knows to search to VLE for Python documentation, and you won't get back any JavaScript or TypeScript or other languages into the context window, confusing the coding agent. So you're able to query for specific information. Today, I prepared a short demo, so I'm gonna share stop sharing here and share my desktop. And whoever can see this here. So first, I'm gonna show OpenCode. And OpenCode is an open source agent harness where you can code and build things. And I've configured OpenCode to not have any web search access. So it doesn't access to Veely. It can't search the web. It just relies on the code itself and the harness to build an agent. I have a prompt here, which I'm gonna pass to both OpenCode and to Junie, and the task is to write a Python script that uses Langchain to create an agent with Tivoli web search. The agent should take a question, search the web, and return a grounded answer. I asked to connect it to the latest OpenAI model, so to know, let's say, we're using g p d five, UV for Python, and use the latest LangChain framework. If I open back here, we can see that recently, LangChain upgraded to LangChain v one. So there's all these new standards where they deprecated some old agent building frameworks, like LaneGraph prebuilt create React agent. And what you'll see is that if the coding agent doesn't have access to web search, it'll use a deprecated package, which will limit results when you go and run the code. You'll get deprecation errors. So let's pass in. Let's close this and pass in this prompt. So it reads the the prompt. It creates the to dos, and we'll see it, like, writing code, and we'll look at the results here. I assume this setup here is familiar to a lot of the audience as code becomes more agentic written with tools like Junie and IDEs that have coding agents where it's more prompting instead of writing code. So it's looking for packages. At the same time here, I'm gonna initialize Junie. Junie comes up. I'm gonna put in the same prompt, And it's gonna reason, understand the information. And as you see here, has the ability to use web search with Tevili. So it understands that it needs the latest lane chain frameworks. It can do a search result for the latest LangChain agent building frameworks. It can look for the latest OpenAI models. It does another web search for latest Langchain to VLE packages, and now it understands it has all the necessary context, and it's gonna go build us this agent. So we see here that it created an agent using the create agent framework, which is the newest package from Linkchain and had a correct implementation of the prompt. And now it's gonna go run, execute the code, install the imports, and make sure every make sure everything runs okay. Let's check back in open code here. It's still cooking. Juni finished. It created an agent dot py. I'm a language trained agent that takes a user question and searches the web with Tevilli. I use the newest OpenAI model, GPT five, based on the ability to search the web. If not, it might have used an old deprecated model like GPT four o or not as new model, and then it it made sure that it used create agent, chat OpenAI, and the new language change to VLE package here. We're gonna go back here. We're gonna allow it to build on OpenCode, and we're gonna see the results there. Let me return back. Here real quick to see if there's any messages or any questions in the chat. Love the head to head comparison, by the way. Yeah. Yeah. So both that's a great question. They're both use using Sonnet 4.6. Or Linkchain and Tevilli, the package exposes the Tevilli search tools as tools that you can pass into an agent framework when you're building with Landgraf. So you're able to use you're able to import to really search in the to really link chain package and use it when you build agents with LangGraph. Can everyone still hear me? Okay. I can hear it loud and clear. Nice. In the chat, what are what are your, most common, what tools do you guys use to code? Like, do you use Cloud Code? Do you guys use JetBrains IDE? Do you use Cursor? What are you guys most often using? I'd like to hear more about your guys' coding setups. Cool. A lot of cloud code. That's understandable. One thing that's cool about Joonie, when I was setting it up, is that all your skills and MCP tools and agent dot m d files, when you install Junie, you're able to transfer all those files natively into your dot Junie folder so you can continue using it. Cursor, but I miss Jet's brains. Yeah. Found out to when trying to use OpenClaw. Yeah. That was awesome. Last week or maybe the week before, we got integrated as a native search partner with with OpenClaw. So now when you download your OpenClaw, you can just use to view it right away during setup. Let me check back in with OpenCode here and see the results. I'll share my screen again, show the results here. So it used the old create react agents, and, it used g p t 4.1, which is an old OpenAI model even though I asked it to use the latest OpenAI models. So that's an example, like, a head to head comparison of the difference of the ability to search the web, retrieve the latest context versus relying on the model's pretrained data. Miles, I'm reading your question here. Maybe it's more for Nick. Being a fanboy. Sure. In it, there have been obviously a lot of things happening in the IT world. And for us at JetBrains, it's been, I would say, very transformative here. There is a lot of being done here internally to focus on on AI lately and really support the new paradigms of how the developers works. By the way, you can now also use your Courser within well, Courser agent within JetBrains as well. So, Miles, for you, that's something which you might both use familiar tools and and the Courser agent along with. But, yeah, overall, I think we we're really stepping up on this and going full speed. There's a lot of projects which we announced, I think, including the JetBrains Central, including JetBrains AIR, which is our orchestration tool, and, of course, Judi, which will touch base a bit more in details. So I put some of the slides back up here. Let me know if I ever can see those again. It's giving me some errors when I try to share the slides. Maybe Jackie or Paige said we can see them. Great. So as you guys saw in the demo, we were able to get task aware retrieval. So the agent was able to understand. We're looking for lane chain. We're looking for the newest OpenAI models, search the Internet. What you don't see, what's abstracted away in Junie's implementation is the raw results you return back from the raw results you return back from Tavili, and those are in context chunks. So you don't get the whole page. You get token efficient information dense information. In addition, there's low latency there with the searches. Each search takes around, like, two seconds or a second and a half, which allows the agent to give you your code in a timely manner. And the last thing where there's a where there's a bigger risk is there's prompt injection detection with Tevilli. And what this does is that I'm sure you guys are mostly familiar, but if we're ingesting information from the web and we're putting it into code and we have something like brave mode turned on or we auto accept code into our coding environments, we have to be wary of where this code is coming from or where this information is coming from because down the line, this code can affect your systems or your company's information. So what Tivili does is that when we return the information from the Internet, we check for prompt injections. So I heard a lot of you say, Kiro, Codex, OpenClaw, your coding, and all these different platforms. And the paradigm shift that we felt at Tevili was that the agent is able to execute code for you, and that's why we've been pushing for agent skills, and we built our Tevili CLI. And what this means is that you can install the CLI and the skills into whatever platform you use to interact with agents. And the agent will have access to the CLI to execute each command, search, extract, research, crawl, and it allows your agent, no matter who you are, what you're doing, access to TeVely. So we released this this month or in March, and we're excited about it. We've seen great usage and great feedback using the CLI. I wonder in the chat, do you guys have you guys switched over to using CLIs over MCPs, or do you guys both rely on kind of both systems, view them differently? Yeah. Jeffrey is still using MCP. I feel like as the paradigm has moved to OpenCLI and these, like, always on systems, the CLI has allowed my agents to use less tokens because you're not passing in the instructions. You can pass them in more real time. Jonathan switching CLI for quick fixes bugs. I think now in this session, Nick, it would be helpful to hear from you a bit about your experience building Junie, building the coding agent. If you can explain a bit how you've integrated to Veely, how you kind of view the the space that we're in, being from a great company like JetBrains, it'd be intriguing to hear. No. Absolutely. Yeah. So, obviously, there there is a lot of existing, let's say, APIs providers. But what I think the the past approach of the of the search APIs was that that they would provide you some sort of, yeah, similar similar interface like you would see on a typical web search where you would get some metadata about the page, and then you would need to actually get the the HTML from the page, and then you would need to to go for this HTML. And what stands out into those that are three d optimized for agenda queues? You already mentioned the chunking and talking efficiency, which is already a big part of it because you're stripping down a lot of HTML tags and params from there, which basically for search purposes don't really have the value but do have the the token cost. So that that that stands out a lot. And also the the the chunk in itself, ability to to control that is is a great use. So, yeah, having the API which is agentic focused has been a big help for us in this regard, and that that really, I think, interesting part about the Tavila approach. In general, for us, it's it's a balancing act with with using the the web search. The way we kind of trigger this whole process is without going to too much specifics, I would say we we approach Juni to have the so called router architecture. Right? We have the the the router which initially is Samsung which received the the user input and put it into a pipeline. And and we start with a, yeah, with the task level rolled in. We then assess if we need to provide for this particular tool lead the capabilities. Oh, I think I lost Evan there. Okay. You guys still have me? Yes. Perfect. Great. Yeah. And and then after we provide the capability and web search is one of those capabilities, we also have the the intent routing, which is the for agent to decide whether we should go and look for the answer within the code base, look for the answer within our existing agent memory, or we need to go and have external answer. So all of this has become the kind of the the pipeline behind the scenes, Junie does in order to go and and do the search. And as Evan already pointed out, the the use cases for that are, yeah, quite quite real. The the good example which he demonstrated in terms of the the difference of knowledge about the the model, but it's not just that. I think the the important part is knowledge about the latest version of the, yeah, of the packages or the the libraries. I I've seen it myself quite often, and the reason is, obviously, the the knowledge cut off by by the models. So this is really important part in in our GenTeX pipeline. And and if you build a new own agents or GenTeq harnesses, you also will encounter head end solutions like that will do. Alright. How do I handle context window limits? Yeah. So at ours, specifically in Joonie, we have somewhat different approach to to some other agents, meaning that we do maintain so called permanent so after every after every task which Juni does, we do have the the pipeline of history processing where we're trying to actively compact and compress the existing existing context. And we're trying to remove the the old old irrelevant parts and update them. And like I said, it's sort of the something which does in every step, but it triggers when the we we're getting reaching out to the context limits. And like I said, already being able to fit within those limits with your web search is is quite quite an important setup. Yeah. Another thing which I think I haven't I haven't really touched is the the mechanism where the models are quite well aware of your existing popular libraries or popular open source libraries. But the moment you go beyond, let's say, 20, top 30, top 50 packages, the models will not know them. And this is where the the being able to do the web search for them is is quite important. I myself encountered that when I had a project where I think we implemented the connection with Telegram Messenger, and their the model has just been hallucinated and making up functions and methods which did not exist. Alright. There was another question from Jeffrey here. How does Juni determine when to call the search tool? What is a roller? Yeah. I I kind of touched on that already within the was in my previous setup. Like I said, the the the mechanism have three different layers. One, in general, we're trying to to to see where the the call fit in the pipeline. We do check some of the allow permissions. There is a way to to kind of control that. And then we do allow model to call the tool and and and get the results. Yeah. Nick, I have a question for you here. As we see, Sure. I know JetBrains works a lot with, like, large companies using JetBrains' IDEs or tools to build production ready code. AgenTic tools like Junie, how do you, like, view adoption, while also having the necessary safeguards in place, where where code review is still necessary and where this agent harness can't kind of replace the software engineer but collaborate with an engineer, and to be more efficient, where do you see Junie going? Like, is it gonna be integral to the JetBrains platform of building with agent coding agent? Some thoughts there. Yeah. Absolutely. So overall, I don't I don't think I have the hard numbers right now with me, but I think we've seen the steady increase of the agentic use within the within IDs. I would say if, let's say, last year, it was maybe less than ten percent, I would think I will not be far off if I would say that this year, we're seeing percentages of getting to twenty five plus percent of people using some AI or genetic functionality within IDs. The other part, I think, which you which you mentioned there is very interesting in terms of how the whole, I would say, is a development workflow pipeline, looks like. I would say there is definitely a lot of shift happening now to looking at the of ability. I I I hear in some echo, maybe, Evan, on your side. I'm not sure. We we we're seeing that amount of code drastically increased. Right? The amount of generated code. The reason for that is obviously that the models are being able to generate that for us. We actually do have a setup, and we're working on even the the the next version of the code reviews. So one version of the code review, which we already have live now, is where, essentially, code review being integrated in your CICD pipeline and triggered within the pull request, let's say. I think this is the most popular feature which we internally use because, again, it it really helps to point out the the the different aspects which you haven't really seen. And also, if you set up the the right guidelines and which align with your with your team coding standards or run running your team linters or tests, then it becomes super, super helpful setup. The the next version which we're working on is a little bit more related to local code reviews, which are related to developers being able to review together with agent the changes before they hand over the PR. So it helps them to prepare to data comments from agent on different parts of of the intended PR and getting the critical review on them from, again, look looking at your existing coding standards. So I really hope that we will get it out there by April. But our early access program, which is using that giving us a really cool feedback on that. So, hopefully, we'll we'll share it with a wider audience really soon. Cool. Very insightful. One thing which I like about Junie is the how it's LLM agnostic. And at TeVealy, like, being a web search layer, we're also LLM agnostic. You can use us with any LLM provider. So in my work, my day to day, I'm building agents where I see a lot of different LLMs, and they all kind of behave differently or the vibe of them is different. And I'm wondering, from your experience building an agent harness, have you seen differences when you use, Junie, let's say, with a anthropic model or an OpenAI model or maybe, like, Minimax? And does that, does it vary your harness at all based on the model? If it's a more of a reasoning model or if the model has, like, larger capabilities, does it change how you guys internally, build the harness? Excellent question. Yeah. Really, we do we do run quite a lot of benchmarks on all kind of models. And we not just test the, let's say, the obvious result in terms of how model the the success rate of model, let's say. But we also look at the aspects including the cost per task or we're looking at aspect of amount of step per task or for looking at aspects of amount of tool calls or parallel tool calls. And it really interesting to look how how volatile actually the models behave from from update to update. Even the same model will will have very different changes. We do have opportunity, fortunately, to to communicate with the model providers and provide them our feedback on that. And they do incorporate some of that into into the next version. I think the most interesting outcome for us, we've been looking for a way to find this perfect ratio between the quality and cost and playing with different models. And for us, Gemini Flash fit right into the setup, which many people kind of look at the Gemini Flash down as a kind of not a model for series coding task, but we actually been very surprised with that. I think we published it a few weeks not not fully, but the the benchmark is w e re rebench published results a couple of weeks back. And we've been on par in terms of the solution rate versus quote quote on Opus on latest Opus versus Juni on Gemini flash. And, essentially, the results being was marginal marginally higher for Clot, but was cost per task was, like, 10 times difference. It was about 30¢ a problem on Joonie versus $3 on ClotCot. So, yeah, I think this this makes it pretty interesting working into in in this multi LLM setup because you're able to make the pipelines which, yeah, kind of works for you better. For us, also, we allow you to do some specification for what models will be used for, let's say, custom sub agents. And there, you might use some smaller model for some smaller tasks. We're we're definitely looking at this part too because the the whole software engineering part requires a lot of different tasks. And some requires a high model, and some requires a a slightly lower model. Okay. I see there are some questions from Miles there. Do you guys plan to support connecting to local models? Yeah. We we definitely have plans for that. There is already option to do that right now, though kind of very limited, and you need to play with the model IDs. But, yeah, if you go to the docs and search in our word, we need documentation, something called, like, custom model. You should be able to find how how it works right now. But it's not very user friendly, to be honest, because you need to do it now with a, like, extra arguments and stuff. We will make it better in the future. Definitely, make it more user friendly. Yeah. And then another question from Miles is, I understand you open the doors for prompt looks and such as still happening. Yeah. We do have some guardrails within our within Judy. But, again, there is multiple points of potential attacks, obviously, if if people really do want that. We do plan to to integrate more free prompt review every prompt on the server side, on on the Juni side, but it's not available right now. JetBrains also published in the MCP registry, I think, really soon. And their part of the part of our approach to this registry was that every server which is available at the JetBrains MCP registry is being tested on prompt injections and on security reviews. So that part already will be there by design. Probably Thanks, Nick. That was super insightful. We're gonna move to the last phase of this webinar. We'll do, like, a general q and a, questions about Tevili, questions about JetBrains. It doesn't need to be specific. I see Jonas has one here about LinkedIn data, and Tavilli has great coverage of LinkedIn information. You can get information from LinkedIn just by, like, doing a people search. So, like, who is Evan Reimer will return my LinkedIn as one of the search results. And a cool feature that we have is that you can pass in and include domains parameter when you conduct a TBLee search. So you can search Evan Reimer and then include the domain LinkedIn, and that will restrict the results to just domains that are LinkedIn that match my name. So we have lots of customers that use Tevili for, like, company lookup or people lookup or create their own deep research agents that way. Let's see. Any other, any other final thoughts or questions, before we wrap up? Thanks, Miles. Yeah. The certification is is great. I'm interested, Miles. Maybe if you can share some applications you've built with Tevili or if you use it in your own personal coding agents. Cool. Coding agents, Claude, and private apps. Yeah. I've seen a lot of uptake with Claude as our MCP. We have an official Claude connector. So even internally, we're all kind of pilled on Claude co work at the moment. So you use to there and our co work to get web information. Christopher, that's a great question. I think if I understand the question correctly, is that can to really, access, like, paywalled information if somehow you provide the credentials of that source within maybe the payload request. At the moment, no. To really only can extract and get, public web information, but we're working, with, like, providers and understanding private data or payroll information and the possibilities of a user who pays for that payroll information, for that information to come back for just them in a TeVealy search. Jeffrey, that's also a great question on the TeVealy website. We have a section about benchmarks. Some of, like, common benchmarks are simple q QA, browse comp, etcetera. However, when I conduct evaluations with all these customers that I work with, I love to conduct evaluations or benchmarks on their information or relevant information. For example, for JetBrains, a simple QA benchmark wouldn't be relevant for coding agents. You'd wanna test it really on relevant queries to you. If that's all, we're gonna wrap up for today. Thank you everyone for coming, spending time. I hope you learned something, took some valuable insight there. And thank you, Nick, for coming and sharing about to VLE with JetBrains and Junie. I'm excited to use it with my EAP access, in my daily workflows. Thanks, everyone. Thank you for having me. Cheers.