Up to Sunnyvale again today, this time for a Cloud Next ’24 rehearsal session tomorrow. Just a couple weeks until the event. We’ve got a couple tickets left to purchase, so get on that!
[blog] 12 Documentation Examples Every Developer Tool Can Learn From. I’m learning how tricky it is to get documentation “right.” There are so many different audiences, learning goals, and ways to explain a product. This post looks at who does docs well.
[blog] Setting Up Kafka Multi-Tenancy. Here’s how DoorDash does “testing in production” with a multi-tenant architecture that includes their messaging system.
In this tutorial, I'll teach you how to use the react-data-table-component library in your React projects. You'll learn how to render a table that has features such as pagination, searching/filtering, and sorting. I'll walk you through each step, from setting up a React and TypeScript project with Vite
Game Bytes is our monthly series taking a peek at the world of gamedev on GitHub—featuring game engine updates, game jam details, open source games, mods, maps, and more. Game on!
After 11 years of development, KeeperRL is celebrating the big 1.0 and is now out of early access. Billed as “the ultimate evil wizard simulator,” KeeperRL is a roguelike base builder that lets you dig into the earth to expand your dungeon and build up to intimidate the countryside with a fortress. The 1.0 release introduces minor villains, new workshops, new items, and many other improvements. So, tent your fingers evilly and check out KeeperRL on Steam.
Canabalt has returned to the web! @AdamAtomic‘s 2009 Flash game didn’t invent the endless runner genre: it set it on fire. While it has remained available on various other platforms, it’s back in your browser with an official port by @ninjamuffin99. He tells us that it’s a very faithful port, since it was built with HaxeFlixel, a descendant of both Flash’s ActionScript and Canabalt‘s original Flixel framework. Start your daring escape now, or check out the source on GitHub.
The Unity team has released Megacity Metro, a demo for building a game in Unity with a little bit of everything: large scale multiplayer, cross-platform clients, prediction netcode, and server-authoritative gameplay. The open source demo is an interesting peek behind the curtain. If a game uses a lot of what Unity has to offer, what does it look like? Megacity Metro is one impressive answer. Head over to the project site or repo to learn more.
Return to Area 51
Classic first-person shooter Area 51 was originally released in 2005 for Playstation 2, Xbox, and Windows. The preservationists of Project Dreamland have since shared source code “found at a garage sale of a former THQ developer.” Work is now ongoing to see if the game can be built and run on modern systems. Get yourself ready to pay a visit to “Groom Lake” by checking out the project on GitHub.
Defold, the all-in-one cross-platform game engine and editor, has shipped version 1.7. Defold 1.7 includes a new API for converting world to local coordinates, getting and setting sprites’ vertex attributes, and many bug fixes. There’s much more to the release, but you’ll have to read the Defold 1.7 release notes to get all the details.
The Mirror is an all-in-one game development environment, built atop Godot Engine. The no/low-code engine and editor promises to help you “edit a game with friends in real-time.” The project just went open source on March 15. Read the announcement, then head over to the repo to get started.
Discord Embedded App SDK in developer preview
Discord has unveiled a new embedded app SDK. Now in a preview, developers will be able to create multiplayer games and other social activities in an <iframe> that runs directly within the Discord client. The SDK handles coordination between Discord and the third-party applications. We’re excited to see how developers will connect chat communities with their games! Head over to the Discord site for more.
New Phaser Docs go wherever you go
Phaser, a desktop and mobile web game framework, has shipped a brand new docs app called Phaser Explorer. Phaser Explorer is a new way to check out reference documentation, play with sample code, and explore the Phaser API. In a standout move for gamedev docs, Phaser Explorer is a progressive web application (PWA), so it works offline. Plus, in some browsers (such as Microsoft Edge), you can install the PWA as a standalone app. Try out Phaser Explorer now.
Game jams
Upcoming jams
Gamedev.js Jam (April 13 to 26)—Build an HTML5 game in 13 days, with a theme to be announced when the jam opens.
Fish Fest (April 1 to 8)—Not some kind of April Fools joke, I swear. All games created must feature fish. PROMINENTLY.
Recently-ended jams
7DRL recently wrapped its 20th year of jamming to make and finish a roguelike in 7 days or less. Though voting is still in progress, it’s already clear that there are many great entries this year. Here are a few entries to play and hack on:
GladiatorRL (source) challenges you with gladiatorial combat, focused on movement and positioning. The game explores a fresh combat system using abilities in space—moving both yourself and your enemies, interrupting or intercepting attacks—not simply bumping into opponents.
This month’s Game Jam Game of the Month may have you sliding dull rocks around, but the game itself is a gem. Lithic, an entry to Brackeys Game Jam 2024.1, is a sokoban puzzle game with a twist: to proceed, you’ll need some help from the statues that reside in Lithic‘s levels. The game won’t just test your logic skills, you’ll solve some lightly challenging word games, too. With excellent art, music, tutorialization, and written dialogue, don’t delay sliding into Lithic on itch.io.
AI systems like Bing and Microsoft Copilot (web) are as good as they are because they continuously learn and improve from people’s interactions. Since the early 2000s, user clicks on search result pages have fueled the continuous improvements of search engines. Recently, reinforcement learning from human feedback (RLHF) brought step-function improvements to response quality of generative AI models. Bing has a rich history of success in improving its AI offerings by learning from user interactions. For example, Bing pioneered the idea of improving search ranking (opens in new tab) and personalizing search (opens in new tab) using short- and long-term user behavior data (opens in new tab).
With the introduction of Microsoft Copilot (web), the way that people interact with AI systems has fundamentally changed from searching to conversing and from simple actions to complex workflows. Today, we are excited to share three technical reports on how we are starting to leverage new types of user interactions to understand and improve Copilot (web) for our consumer customers. [1]
How are people using Copilot (web)?
One of the first questions we asked about user interactions with Copilot (web) was, “How are people using Copilot (web)?” Generative AI can perform many tasks that were not possible in the past, and it’s important to understand people’s expectations and needs so that we can continuously improve Copilot (web) in the ways that will help users the most.
A key challenge of understanding user tasks at scale is to transform unstructured interaction data (e.g., Copilot logs) into a meaningful task taxonomy. Existing methods heavily rely on manual effort, which is not scalable in novel and under-specified domains like generative AI. To address this challenge, we introduce TnT-LLM (Taxonomy Generation and Text Prediction with LLMs), a two-phase LLM-powered framework that generates and predicts task labels end-to-end with minimal human involvement (Figure 1).
We conducted extensive human evaluation to understand how TnT-LLM performs. In discovering user intent and domain from Copilot (web) conversations, taxonomies generated by TnT-LLM are significantly more accurate than existing baselines (Figure 2).
We applied TnT-LLM to a large-scale number of fully de-identified Copilot (web) conversations and traditional Bing Search sessions. The results (Figure 3) suggest that people use Copilot (web) for knowledge work tasks in domains such as writing and editing, data analysis, programming, science, and business. Further, tasks done in Copilot (web) generally are of higher complexity and more knowledge work-oriented compared to tasks done in traditional search engines. Generative AI’s emerging capabilities have evolved the tasks that machines can perform, to include some that humans have traditionally had to do without assistance. Results demonstrate that people are doing more complex tasks, frequently in the context of knowledge work, and show that this type of work is being newly assisted by Copilot (web).
Estimating and interpreting user satisfaction
To effectively learn from user interactions, it is equally important to classify user satisfaction and to understand why people are satisfied or dissatisfied while trying to complete a given task. Most important, this will allow system developers to identify areas of improvement and to amplify and suggest successful use cases for broader groups of users.
People give explicit and implicit feedback when interacting with AI systems. In the past, user feedback was in the form of clicks, ratings, or survey verbatims. When it comes to conversational systems like Copilot (web), people also give feedback in the messages they send during the conversations (Figure 4).
The supervised extraction prompt extracts diverse in situ textual feedback from users interacting with Copilot (web).
The summarization rubric prompt identifies prominent textual feedback patterns and summarizes them into rubrics for estimating user satisfaction.
Based on the summarized rubrics, the final scoring prompt takes a conversation between a user and the AI agent and rates how satisfied the user was.
We evaluated our framework on fully de-identified conversations with explicit user thumbs up/down in Copilot (web) (Table 1). We find that SPUR outperforms other LLM-based and embedding-based methods, especially only limited human annotations of user satisfaction are available. Open-source reward models used for RLHF cannot be a proxy for user satisfaction, because reward models are usually trained with auxiliary human feedback that may differ from the feedback from the user who was involved in the conversation with the AI agent.
Method
Weighted F1-score
Reward (RLHF)
17.8
ASAP (SOTA of embedding)
57.0
Zero-Shot (GPT4)
74.1
SESRP (GPT4)
77.4
Table 1. Performance comparison between models for user satisfaction estimation.
Another critical feature of SPUR is its interpretability. It shows how people express satisfaction or dissatisfaction (Figure 6). For example, we see that users often give explicit positive feedback by clearly praising the response from Copilot (web). Conversely, they express explicit frustration or switch topics when encountering mistakes in the response from Copilot (web). This presents opportunities for providing customized user experience at critical moments of user satisfaction and dissatisfaction, such as context and memory reset after switching topics.
In the user task classification discussed earlier, we know that people are using Copilot (web) for knowledge work and more complex tasks. As we further apply SPUR for user satisfaction estimation, we find that people are also more satisfied when they complete or partially complete cognitively complex tasks. Specifically, when regressing task complexity on the SPUR-derived summary user-satisfaction score, we find generally increasing coefficients on increasing levels of task complexity when using the lowest level of task complexity (i.e. Remember) as a baseline, provided the task was at least partially completed (see Table 2). For instance, partially completing a Create-level task, which is the highest level of task complexity, leads to an increase in user satisfaction that is more than double the increase when partially completing an Understand-level task. Fully completing a Create-level task leads to the largest increase in user satisfaction.
These three reports present a comprehensive and multi-faceted approach to dynamically learning from conversation logs in Copilot (web) at scale. As AI’s generative capabilities increase, users are finding new ways to use the system to help them do more and shift from traditional click reactions to more nuanced, continuous dialogue-oriented feedback. To navigate this evolving user-AI interaction landscape, it is crucial to shift from established task frameworks and relevance evaluations to a more dynamic, bottom-up approach to task identification and user satisfaction evaluation.
[1]The research was performed only on fully de-identified interaction data from Copilot (web) consumers. No enterprise data was used per our commitment to enterprise customers. We have taken careful steps to protect user privacy and adhere to strict ethical and responsible AI standards. All personal, private or sensitive information was scrubbed and masked before conversations were used for the research. The access to the dataset is strictly limited to approved researchers. The study was reviewed and approved by our institutional review board (IRB).
In this episode, Ben and Ryan are joined by Joshua Fox, a senior cloud architect at DoiT, to discuss cloud cost optimization. They explore the importance of controlling and understanding cloud costs, the role of good architecture in cost optimization, and strategies for dealing with surprise costs.
Script flipped! Today we’re sharing two interviews of us on Other People’s Podcasts (OPP): Kathrine Druckman from the Open at Intel podcast invited us on the show at KubeCon NA in November and Den Delimarsky hosted Jerod on The Work Item podcast in February.
Changelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!
Sponsors:
Synadia – Take NATS to the next level via a global, multi-cloud, multi-geo and extensible service, fully managed by Synadia. They take care of all the infrastructure, management, monitoring, and maintenance for you so you can focus on building exceptional distributed applications.
FireHydrant – The alerting and on-call tool designed for humans, not systems. Signals puts teams at the center, giving you ultimate control over rules, policies, and schedules. No need to configure your services or do wonky work-arounds. Signals filters out the noise, alerting you only on what matters. Manage coverage requests and on-call notifications effortlessly within Slack. But here’s the game-changer…Signals natively integrates with FireHydrant’s full incident management suite, so as soon as you’re alerted you can seamlessly kickoff and manage your entire incident inside a single platform. Learn more or switch today at firehydrant.com/signals
Cloudflare – Cloudflare’s Developer Week is happening April 1-5, 2024. Also you can hang with Adam and the rest of the folks at Cloudflare at the Cloudflare offices in Austin, TX on Wednesday, April 3rd at 5:30pm — register here.