Latest Tutorials

Learn about the latest technologies from fellow newline community members!

  • React
  • Angular
  • Vue
  • Svelte
  • NextJS
  • Redux
  • Apollo
  • Storybook
  • D3
  • Testing Library
  • JavaScript
  • TypeScript
  • Node.js
  • Deno
  • Rust
  • Python
  • GraphQL
  • React
  • Angular
  • Vue
  • Svelte
  • NextJS
  • Redux
  • Apollo
  • Storybook
  • D3
  • Testing Library
  • JavaScript
  • TypeScript
  • Node.js
  • Deno
  • Rust
  • Python
  • GraphQL

Common Statistical LLM Evaluation Metrics and what they Mean

In one of our earlier articles , we touched on statistical metrics and how they can be used in evaluation - we also briefly discussed precision, recall, and F1-score in our article on benchmarking . Today, we’ll go into more detail on how to apply these metrics more directly, and more complex metrics derived from these that can be used to assess LLM performance. This is a standard measure in statistics, and has long been used to measure the performance of ML systems. In simple terms, this is a measure of how many samples are correctly categorised (true positives) or predicted by a model out of the total set of samples predicted to be positive (true positives + false positives). If we take a simple examples of an ML tool that takes a photo as an input and tells you if there is a dog in the picture, this would be:
Thumbnail Image of Tutorial Common Statistical LLM Evaluation Metrics and what they Mean

How Good is Good Enough: Subjective Testing and Manual LLM Evaluation

In our previous article , we talked about the highest level of testing and evaluation for LLM models, and went into detail about some of the most commonly used benchmarks for validating LLM performance at a high level. Today, we’re going to look a at some more fine-grained evaluation metrics that you can use while building an LLM-based tool. Here we make the distinction between statistical metrics - that is those computed using a statistical model - and more generalised metrics that attempt to measure the more ‘subjective’ elements of LLM performance (such as those used in manual testing) and that use AI to evaluate how useful a model is in its given context. In this article we’ll give an overview of the different classes of metrics used and cover human evaluation and its importance before moving on to common statistical metrics and LLM-as-Judge evaluations in the following articles.

I got a job offer, thanks in a big part to your teaching. They sent a test as part of the interview process, and this was a huge help to implement my own Node server.

This has been a really good investment!

Advance your career with newline Pro.

Only $40 per month for unlimited access to over 60+ books, guides and courses!

Learn More
NEW

How To Build Beautiful, Responsive UIs in Minutes With Bolt

Welcome! This is part 5 of our course on how to build fullstack apps with Bolt and Supabase If you’re just joining, I highly recommend you take the course in the correct order before diving into this one. Here you can find Part 1 , Part 2 , Part 3 , and Part 4 .
Thumbnail Image of Tutorial How To Build Beautiful, Responsive UIs in Minutes With Bolt

MCP Explained: Taking Your AI Agents to New Heights

If you’re into AI agent development, you’ve probably started hearing more and more about a new emerging protocol – Model Context Protocol ( MCP ). In essence, this protocol simplifies how AI agents connect to the data and tools they need. By standardizing these connections, MCP reduces the extra work developers usually have to deal with. Essentially, it replaces the need to directly manage multiple APIs in your AI agent with one unified protocol. And lets you to add/remove any external tools for your agent with incredible ease. Making it more convenient to build complex and flexible AI systems. In this article, we’ll walk you through everything you need to know about MCP—from its core components and main concepts to practical implementations. We will focus specifically on building an MCP server, as it is likely the most useful and frequently used part of the MCP architecture that you will want to implement. So, let’s go! The Model Context Protocol, or MCP, is a simple standard, designed and open sourced by Anthropic to help AI tools talk to the systems where data lives. Think of it like a USB-C port, but for AI applications. Just as a USB-C port lets you connect different devices with one common plug, MCP lets AI models easily connect to various data sources and tools without needing custom code for every connection.
Thumbnail Image of Tutorial MCP Explained: Taking Your AI Agents to New Heights

Inside AI Agents: Core Principles and How They Remember

As AI continues to evolve, we’re constantly finding new ways how to improve and to use it. Today, AI has gone much further being just a chat tool. And one of these significant evolutionary steps is the creation and adoption of AI agents. With agents, you can deploy AI solutions that autonomously perform real-world tasks, for example: managing customer support, processing large amounts of information in real-time, and much more! Basically, any task that benefits from working with real-time data and reasoning capabilities. This series of articles will help you not only to grasp the fundamentals of AI agents, but also to get a practical experience of building one yourself. Covering crucial theoretical knowledge and concepts, as well as also learning how to properly apply them in the real world.
Thumbnail Image of Tutorial Inside AI Agents: Core Principles and How They Remember