Premium Tutorials

Learn about the latest technologies from \newline premium tutorials.

How To Deploy Your Web App With Netlify

Welcome! This is the sixth and final lesson on how to build fullstack apps with Bolt and Supabase If you’re just joining, you’re in luck, ‘cause we already have tons of content for you to enjoy while turning your app from just an idea to deployed web application in just an afternoon. Before you dive into this lesson, here’s where you can find Part 1 , Part 2 , Part 3 , Part 4 , and Part 5 if you want to get up to speed (which you probably do, otherwise, what exactly are you going to deploy? 🤔)
Thumbnail Image of Tutorial How To Deploy Your Web App With Netlify

What is LLM as Judge and Why Should you use it?

In the last article we covered statistical metrics like Perplexity, BLEU, ROUGE and more, as well as some of the statistical concepts that underpin them, their strengths (accuracy, reliability) and weaknesses (no subjective focus, use of reference texts. Between human evaluation (manual testing) and statistical measures we get a mix of high-value qualitative assessment on a small part of the test surface, and a rigorous but limited view on a wider area. That still leaves a lot of middle ground uncovered! That’s why there’s been a push the last few years to get coverage for the space between - something that has a level of subjectivity and nuance but that also scales up. This is where LLM-as-a-Judge comes in. In our manual testing for LLMs article I compared this to a kind of ouroboros where AI validates AI - and rightly so, that isn’t necessarily a bad thing. LLMs are able to do some things better than humans and LLM-as-a-Judge plays to those strengths - but it does not replace the need for human oversight and statistical assessment. There’s also metrics that combine LLM-as-a-Judge with statistical metrics - but we’ll talk more about that later.
Thumbnail Image of Tutorial What is LLM as Judge and Why Should you use it?

I got a job offer, thanks in a big part to your teaching. They sent a test as part of the interview process, and this was a huge help to implement my own Node server.

This has been a really good investment!

Advance your career with newline Pro.

Only $40 per month for unlimited access to over 60+ books, guides and courses!

Learn More
NEW

Jailbreaking DeepSeek R1: Bypassing Filters for Maximum Freedom

Large language models (LLMs) are very powerful tools that can help us with a wide range of tasks. These models are usually built with safety features meant to stop them from generating harmful, inappropriate, or otherwise restricted content. However, over time, researchers and enthusiasts have discovered ways to bypass these safeguards—a process known as jailbreaking. In this series of articles, we’re going to show you how to jailbreak one of the most popular open-source models out there: DeepSeek R1. In this opening article, we'll start with prompt jailbreaking. But don’t worry—we’re not just jumping straight into prompt examples. First, we’ll explain what jailbreaking really is, why people do it, and some of the tricky parts you should know about. Sound good? Let’s dive in! DISCLAIMER. This article is for learning and research only. The methods shared here should be used responsibly to test AI, improve security, or understand how these systems work. Please don't use them for anything harmful or unethical.
Thumbnail Image of Tutorial Jailbreaking DeepSeek R1: Bypassing Filters for Maximum Freedom

Next-Level Cursor: Cmd+K, Composer, and Agent Unpacked

In this article, let’s continue to explore Cursor even further. Our first article ( which you can find here ) covered Cursor’s basics and the easiest-to-understand features, such as Rules for AI , Tab autocompletion and the Chat feature. So, if you’re new to Cursor, I highly recommend you check the previous article. In this “Part 2” article, we’ll go over the Cmd+K , Composer , and Agent features, including some use cases. So, get ready to learn how to use Cursor to its fullest potential and save an enormous amount of time. Starting from version 0.46, Cursor includes a lot of UI changes for the AI side panel. That’s why, if you’re currently using an older version, the UI elements mentioned in this article might not look the same for you. That’s completely fine, but I highly recommend you update to the latest version so we’re on the same page.
Thumbnail Image of Tutorial Next-Level Cursor: Cmd+K, Composer, and Agent Unpacked

Common Statistical LLM Evaluation Metrics and what they Mean

In one of our earlier articles , we touched on statistical metrics and how they can be used in evaluation - we also briefly discussed precision, recall, and F1-score in our article on benchmarking . Today, we’ll go into more detail on how to apply these metrics more directly, and more complex metrics derived from these that can be used to assess LLM performance. This is a standard measure in statistics, and has long been used to measure the performance of ML systems. In simple terms, this is a measure of how many samples are correctly categorised (true positives) or predicted by a model out of the total set of samples predicted to be positive (true positives + false positives). If we take a simple examples of an ML tool that takes a photo as an input and tells you if there is a dog in the picture, this would be:
Thumbnail Image of Tutorial Common Statistical LLM Evaluation Metrics and what they Mean