Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150045 stories
·
33 followers

Windows 11 needs its own Windows XP SP2 moment without AI or bloat, says former Microsoft dev who created Task Manager

1 Share

Dave Plummer, who created Task Manager and helped bring the famous Pinball game to Windows, says it’s about time Microsoft creates a special version of Windows 11 without new features, AI and bloat. Microsoft should have another Windows XP SP2 moment, according to Plummer, but is that something the company would ever do? I highly doubt.

“No more AI, no more features, just fixes,” Plummer says. “When I was working on Windows XP, the Blaster worm hit. That was a big enough deal that we set aside all feature work. For the next several months, all we did was improve security.”

If you have never used Windows XP, you probably don’t realise why Windows XP SP2 was a big thing.

What was Windows XP SP2?

Windows XP Service Pack 2 (SP2) was not just a major update to Windows XP, as it was almost a re-release of the operating system with dozens of security-focused features. It arrived after worms like Blaster and Sasser showed just how vulnerable always-online Windows PCs really were, especially at home, where there were no corporate firewalls.

Windows XP Sp2

SP2 fundamentally changed XP’s default security model and it almost felt like “XP 1.5,” moment. The operating system was all about patching major security issues, and fixing bugs without adding too many features that add little or no value. That’s what Windows 11 needs.

“I argue that it’s time for Microsoft to have another XPS2 moment. No more AI, no
more features, just fixes,” Dave Plummer, who created Task Manager said.

Up until the blaster worm attack, Microsoft was focused on adding features that the company thought users would love. After blaster worm hit, the company decided to put all fancy feature on the back burner and focus on getting the OS back on track

“I argue it’s time for Microsoft to stabilize, improve, and make the system more performant and more usable for power users like me and probably like you,” Plummer argues.

Plummer is not wrong here.

You can add AI, but still ship a stable OS

AI Agents are coming to Windows 11 taskbar
Credit: Microsoft

Whether it’s Apple or Google, all of Microsoft’s rivals are adding AI features to their products. The difference is that we’re not seeing reports of major stability issues on macOS or ChromeOS. It is possible to have both AI and stability, and most users wouldn’t mind if the AI features were disabled by default.

At this point, we just want a stable Windows 11 without bloat.

If you’ve been reading WindowsLatest.com, you probably know how bad the state of Windows is right now. There’s a new bug with almost every cumulative update. Last month’s update triggered the BitLocker recovery screen, and now we’re seeing reports of an issue where the password icon disappears from the lock screen.

Windows 11 password option
Password option highlighted here is missing on affected PCs

Worse, Microsoft also managed to break Task Manager with a bug that duplicates the process every time you close it, and

None of this makes sense. How do you break basic features like the password icon or Task Manager when they’re used every day by the same employees who build Windows?

A Windows XP SP2 moment is all Windows 11 needs

Windows needs to go back to the drawing board and Microsoft should bring back the testers it fired years ago, and focus solely on fixing bugs, UI issues and performance concerns.

A decent update focused entirely on bug fixes is all the company needs at the moment. We don’t mind whether you call it SP2 or Creators Update.. what do you think?

The post Windows 11 needs its own Windows XP SP2 moment without AI or bloat, says former Microsoft dev who created Task Manager appeared first on Windows Latest

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Ideas: Community building, machine learning, and the future of AI

1 Share

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In 2006, three PhD students organized the Women in Machine Learning Workshop, or WiML, to provide a space for women in ML to connect and share their research. The event has been held every year since, growing in size and mission.

In this episode, two of the WiML cofounders, Jenn Wortman Vaughan, a Microsoft senior principal research manager, and Hanna Wallach, a Microsoft vice president and distinguished scientist, reflect on the 20th workshop. They discuss WiML’s journey from a potential one-off event to a nonprofit supporting women and nonbinary individuals worldwide; their friendship and collaborations, including their contributions to defining responsible AI at Microsoft; and the advice they’d give their younger selves.

Transcript

[MUSIC]

SERIES INTRODUCTION: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

JENN WORTMAN VAUGHAN: Hello, and welcome. I’m Jenn Wortman Vaughan. This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS. I am especially excited about NeurIPS this year because of a co-located event, the 20th annual workshop for Women in Machine Learning (opens in new tab), or WiML, which I am going to be attending both as a mentor and as a keynote speaker.

So to celebrate 20 years of WiML, I’m here today with my long-term collaborator, colleague, close friend, and my cofounder of the workshop for Women in Machine Learning, Hanna Wallach.

You know, you and I have known each other for a very long time at this point. And in many ways, we followed very parallel and often intersecting paths before we both ended up here working in responsible AI at Microsoft. So I thought it might be fun to kick off this podcast with a bit of the story of our interleaving trajectories.

So let’s start way back 20 years ago, around the time we first had the idea for WiML. Where were you, and what were you up to?

HANNA WALLACH: Yeah, so I was a PhD student at the University of Cambridge, and I was working with the late David MacKay. I was focusing on machine learning for analyzing text, and at that point in time, I’d actually just begun working on Bayesian latent variable models for text analysis, and my research was really focusing on trying to combine ideas from n-gram language modeling with statistical topic modeling in order to come up with models that just did a better job at modeling text.

I was also doing this super-weird two-country thing. So I was doing my PhD at Cambridge, but at the end of the first year of my PhD, I spent three months as a visiting graduate student at the University of Pennsylvania, and I loved it, so much so that at the end of the three months I said, can I extend for a full year? Cambridge said yes; Penn said yes. So I did that and actually ended up then extending another year and then another year and another year and so on and so forth.

But during my first full year at Penn, that was when I met you, and it was at the visiting students weekend, and I had been told by the faculty in the department that I had to work really hard on recruiting you. I had no idea that that was actually going to be the start of a 20-plus-year friendship.

WORTMAN VAUGHAN: Yeah, I still remember that visiting weekend very well. I actually met you; I met my husband, Jeff; and I met my PhD advisor, Michael Kearns, all on the same day at that visiting student weekend. So I didn’t know it at the time, but it was a very big day for me.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics. So even then, you know, just like I am now, I was interested in the intersection of people and AI systems. But since my training was in theory, my “people” tended to be these mathematically ideal people with these well-defined preferences and beliefs who behaved in very well-defined ways.

Working in learning theory like this was appealing to me because it was very neat and precise. There was just none of the mess of the real world. You could just write down your model, which contained all of your assumptions, and everything else that followed from there was in some sense objective.

So I was really enjoying this work, and I was also so excited to have you around the department at the time. You know, honestly, I also loved Penn. It was just such a great environment. I was just actually back there a few weeks ago, visiting to give a talk. I had an amazing time. But it was, I will say, very male dominated in the computer science department at the time. In my incoming class of PhD students, we had 20 incoming PhDs, and I was the only woman there. But we managed to build a community. We had our weekly ladies brunch, which I loved, and things like that really kept me going during my PhD.

WALLACH: Yeah, I loved that ladies brunch. That made a huge difference to me and, kind of, kept me going through the PhD, as well.

And, like you, I’d always been interested in people. And during the course of my PhD, I realized that I wasn’t interested in analyzing text for the sake of text, right. I was interested because text is one of these ways that people communicate with each other. You know, people don’t write text for the sake of writing text. They write it because they’re trying to convey something. And it was really that that I was interested in. It was these, kind of, social aspects of text that I found super interesting.

So coming out of the PhD, I then got a postdoc job focused on analyzing texts as part of these, sort of, broader social processes. From there, I ended up getting a faculty job, also at UMass, as one of four founding members of UMass’s Computational Social Science Institute (opens in new tab). So there was me in computer science, then there was another assistant professor in statistics, another in political science, and another in sociology. And in many ways, this was my dream job. I was being paid to develop and use machine learning methods to study social processes and answer questions that social scientists wanted to study. It was pretty awesome. You, I think, started a faculty position at the same time, right?

WORTMAN VAUGHAN: Yeah. So I also did a postdoc. First, I spent a year as a postdoc at Harvard, which was super fun. And then I started a tenure track position in computer science at UCLA in 2010.

Again, you know, it was a very male-dominated environment. My department was mostly men. But maybe even more importantly than this, I just didn’t really have a network there. You know, it was lonely. One exception to this was Mihaela van der Schaar. She was at UCLA at the time, though not in my department, and she, kind of, took me under her wing. So I’m very grateful that I had that support. But overall, this position just wasn’t a great fit for me, and I was under more stress then than I think I have been at any other point in my life that I could really remember.

WALLACH: Yeah. So at that point, then, you ended up transitioning to Microsoft Research, right?

WORTMAN VAUGHAN: Yep.

WALLACH: Why did you end up choosing MSR [Microsoft Research]?

WORTMAN VAUGHAN: Yeah, so this was back in 2012. MSR had just opened up this new New York City lab at the time, and working in this lab was basically my dream job. I think I actually tried to apply before they had even officially opened the lab, like when I just heard it was happening.

So this lab focused in three areas at the time. It focused in machine learning, algorithmic economics, and computational social science. And my research at the time cut across all three of these areas. So it felt just like this perfect opportunity to work in the space where my work would fit in so well and be really appreciated.

The algorithmic economics group at the time actually was working on building prediction markets to aggregate information about future events, and they were already, in doing this, building on top of some of my theoretical research, which was just super cool to see. So that was exciting. And I already knew a couple of people here. I knew John Langford and Dave Pennock, who was in the economics group at the time, because I’d done an internship actually with the two of them at Yahoo Research before they came to Microsoft. And I was really excited to come back and work with them again, as well.

You know, even here at the time that I joined the lab, it was 13 men and me. So once again, not great numbers. And I think that in some ways this was especially hard on me because I was just naturally, like, a very shy person and I hadn’t really built up the confidence that I should have at that point in my career. But on the other hand, I found the research fit just so spot-on that I couldn’t say no. And I suspect that this is something that you understand yourself because you actually came and joined me here in the New York lab a year or two later. So why did you make this switch?

WALLACH: Yeah, so I anticipated that I was going to love my faculty job. It was focusing on all this stuff that I was so excited about. And much to my surprise, though, I kind of didn’t. And it wasn’t like there was any one particular thing that I didn’t like. It was more of a mixture of things. I did love my research, though. That was pretty clear to me. But I wasn’t happy. So I spent a summer talking to as many people as possible in all different kinds of jobs, really just with the goal of figuring out what their day-to-day lives looked like. You were one of the people I spoke to, but I spoke to a ton of other people, as well.

And from doing that, at the end of that summer, I ended up deciding to apply to industry jobs, and I applied to a bunch of places and got a bunch of offers. But I ended up deciding to join Microsoft Research New York City because of all the places I was considering going, they were the only place that said, “We love your research. We love what you do. Do you want to come here and do that same research?”

And that was really appealing to me because I loved my research. Of course, I wanted to come there and do my same research and especially with all of these amazing people like you, Duncan Watts, who’d for many years been somebody I’d really looked up to. He was there, as well, at that point in time. There was this real focus on computational social science but with a little bit more of an industry perspective. There were also these amazing machine learning researchers. Just for many of the same reasons as you, I was just really excited to join that lab and particularly excited to be working in the same organization as you again.

WORTMAN VAUGHAN: Yeah, I’m happy to take at least a little bit of the credit for …

WALLACH: Oh yeah.

WORTMAN VAUGHAN: … recruiting you to Microsoft here many years ago.

WALLACH: Oh yeah.

WORTMAN VAUGHAN: Yeah. I was really excited to have you join, too, though I think the timing actually worked out so that I missed your first couple of months because I was on maternity leave with my first daughter at the time. I should say I’ve got two daughters, and I’m very proud to share in the context of this podcast that they’re both very interested in math and reading, as well.

WALLACH: Yeah, they’re both great.

Um, so then we ended up working in the same place. But despite that, it still took us several years to end up actually collaborating on research. Do you remember how we ended up working together?

WORTMAN VAUGHAN: Yeah. So I used to tell this story a lot. Actually, I was at this panel on AI in society back in, I think, it was probably 2016. It was taking place in DC. And someone on this panel made this statement that soon our AI systems are just going to be so good that all of the uncertainty is going to be taken out of our decision-making, and something about this statement just, like, really set me off. I got so mad about it because I thought it was just …

WALLACH: I remember.

WORTMAN VAUGHAN: … such an irresponsible thing to be saying. So I came back to New York, and I think I was ranting to you about this in the lab, and this conversation ended up getting us started on this whole longer discussion about the importance of communicating uncertainty and about explaining the assumptions that are behind the predictions that you’re making and all of this.

WALLACH: So this was something … I was really excited about this because this was something that had really been drummed into me for years as a Bayesian. So Bayesian statistics, which forms a lot of the foundation of the type of machine learning that I was doing, is all about explicitly stating assumptions and quantifying uncertainty. So I just felt super strongly about this stuff.

WORTMAN VAUGHAN: Yeah. So somehow all of these discussions we were having led us to read up on this literature that was coming out of the machine learning community on interpretability at the time. There are a bunch of these papers coming out that were making claims about models being interpretable without stopping to define who they were interpretable to or for what purpose. Never actually taking these models and putting them down in front of real people. And we wanted to do something about this. So we started running controlled experiments with real people and found that we often can’t trust our intuition about what makes a model interpretable.

WALLACH: Yeah, one of the things that came up a lot in that work was, sort of, how to measure these squishy abstract human concepts, like interpretability, that are really hard to define, let alone quantify and measure and stuff like that.

WORTMAN VAUGHAN: Absolutely. So I think one of the first things that we really struggled with in this line of work was what it even means to be interpretable or intelligible or any of these terms that were getting thrown around at the time.

Um, we ended up doing some research, which is still one of my favorite papers, …

WALLACH: Me, too.

WORTMAN VAUGHAN: … with our colleagues Forough Poursabzi, Jake Hofman, and Dan Goldstein. And in this work, we found it really useful to think about interpretability as a latent property that can be, kind of, influenced by different properties of a model or system’s design. So things like the number of features the model has or whether the model’s linear or even things like the user interface of the model.

This was kind of a gateway project for me in the sense that it’s one of the first projects that I got really excited about that was more of a human-computer interaction, or HCI, project rather than a theory project like I’d been working on in the past. And it just set off this huge spark of excitement in me. It felt to me at the time more important than other things that I was doing, and I just wanted to do more and more of this work.

I would say the other project that had a really similar effect on me, which we also worked on together right around the same time, was our work with Ken Holstein mapping out challenges that industry practitioners were facing in the space of AI fairness.

WALLACH: Oh yeah. OK, yep. That project, that was so fun, and I learned so much from it. If I recall correctly, we originally hired Ken, who I think was an HCI PhD student at CMU at the time, as an intern …

WORTMAN VAUGHAN: Yep.

WALLACH: … to work with us on creating, sort of, user experiences for fairness tools like the Fairlearn toolkit (opens in new tab). And we started that project—so that was in collaboration with Miro Dudík and Hal Daumé—we started that project by having Ken talk to a whole bunch of practitioners at Microsoft but at other organizations, as well, to get a sense for how they were and weren’t using fairness toolkits like Fairlearn.

And I want to point out that at that point in time, the academic research community was super focused on all of these, like, simple quantitative metrics for assessing the fairness in the context of predictions and predictive machine learning models with this, kind of, understanding that these tools could then be built to help practitioners assess the fairness of their predictive models and maybe even make fairer predictions. And so that’s the kind of stuff that this Fairlearn toolkit was originally developed to do. So we ended up asking all of these practitioners originally just as, sort of, the precursor to what we thought we were going to end up doing with this project.

We also asked these practitioners about their current practices and challenges around fairness in their work and about their additional needs for support. So where did they feel like they had the right tools and processes and practices and where did they feel like they were missing stuff. And this was really eye-opening because what we found was so different than what we were expecting. And there’s two things that really stood out to us.

So the first thing was that we found a much, much wider range of applications beyond prediction. So we’d come into this assuming that all these practitioners were doing stuff with predictive machine learning models, but in fact, we were finding they were doing all kinds of stuff. There was a bunch of unsupervised stuff; there was a bunch of, you know, language-based stuff—all of this kind of thing. And in hindsight, that probably doesn’t sound very surprising nowadays because of the rise of generative AI, and really the entire machine learning and AI field is much less focused on prediction in that, kind of, narrow, kind of, classification-regression kind of way. But at the time, this was really surprising, especially in light of the academic literature’s focus on predictions when thinking about fairness.

The second thing that we found was that practitioners often struggled to use existing fairness research, in part because these quantitative metrics that were all the rage at that point in time, just weren’t really amenable to the types of real-world complex scenarios that these practitioners were facing. And there was a bunch of different reasons for this, but one of the things that really stood out to us was that this wasn’t so much about the underlying models and stuff like that, but it was actually that there were a variety of data challenges involved here around things like data collection, collection of sensitive attributes, which you need in order to actually use these fairness metrics.

So putting all this together, the upshot of all this was that we never did what we originally set out to do with that [LAUGHS], that internship project. We … because we uncovered this really large gap between research and practice, we ended up publishing this paper that characterized this gap and then surfaced important directions for future research. The other thing that the paper did was emphasize the importance of doing this kind of qualitative work to actually understand what’s happening in practice rather than just making assumptions about what practitioners are and aren’t doing.

The other thing that came out of it, of course, was that the four of us—so you, me, Miro and Hal—learned a ton about HCI and about qualitative research from Ken, which was just, uh, so fun.

WORTMAN VAUGHAN: Yeah, and I started to be confronted with the fact that I could no longer reasonably ignore all of these messes of the real world because, you know, in some ways, responsible AI is really all about the messes.

So I think this project was really a big shift for both of us. And in some ways, working on this and the interpretability work really led us to be active in these early efforts that were happening within Microsoft in the responsible AI space. Um, the research that we were doing was feeding directly into company policy, and it felt like it was just, like, a huge place where we could have some impact. So it was very exciting.

So switching gears a bit. Hanna, do you remember how we first got the idea for WiML?

WALLACH: Yes, I do. So we were at NeurIPS. This was back in 2005. It was a … so NeurIPS was a very different conference back then. Now it’s like tens of thousands of people. It’s held in a massive convention center. Yes, there are researchers there, but there’s a variety of people from across the tech industry who attend, but that is not what it was like back then.

So in around … in 2005, it was more like 600 people thereabouts in total[1], and the main conference would be held every year in Vancouver, and then everybody at the conference would pile onto these buses, and we would all head up to Whistler for the workshops.

WORTMAN VAUGHAN: Yep.

WALLACH: So super different to what’s happening nowadays. It was my third time. I think that’s right. I think it was my third time attending the conference. But it was my first time sharing a hotel room with other women. And I remember up at the workshops, up in Whistler, there were five of us sitting around in a hotel room, and we were talking about how amazing it was that there were five of us sitting around talking, women. And we, kind of, couldn’t believe there were five of us. We’re all PhD students at the time. And so we decided to make this list, and we started trying to figure out who the other women in machine learning were. And we came up with about 10 names, and we were kind of amazed that there were even 10 women in machine learning. We thought this was a huge number. We were very excited. And we started talking about how it might be really fun to just bring them all together sometime.

So we returned from NeurIPS, and you and I ended up getting lunch to strategize. I still remember walking out of the department together to go get lunch and you were walking ahead of me. I can visualize the coat you were wearing as you were walking in front of me. And so we strategized a bit and ended up deciding, along with one of the other women, Lisa Wainer, to submit a proposal to the Grace Hopper conference for a session in which women in machine learning would give short talks about their research.

We reached out to the 10 names that we had written down in the hotel room and through that process actually ended up finding out about more women in machine learning and eventually had something like 25 women listed on the final proposal. I think there’s an email somewhere where one or another of us is saying to the other one, “Oh my gosh! I can’t believe there are so many women in machine learning.”

So we submitted this proposal, and ultimately, the proposal was rejected by the Grace Hopper conference. But we were so excited about the idea and just really invested in it by that point that we decided to hold our own co-located event the day before the Grace Hopper conference. And I’ve got to say, you know, 20 years later, I don’t know what we were thinking. Like, that was a bold move on the part of three PhD students. And it turned out to be a huge amount of work that we had to do entirely ourselves, as well.

WORTMAN VAUGHAN: Yeah.

WALLACH: We had no idea what we were doing. But the Grace Hopper folks very nicely connected us with the venue that the conference was going to be held at, and somehow, we managed to pull it off. Ultimately, that first workshop had around 100 women, and there was this … rather than just, like, a single short session, which is what we’d originally had in mind, we had this full day’s worth of talks. I actually have the booklet of abstracts from all of those talks at my desk in the office. I still have that today. And it was just an amazing experience.

WORTMAN VAUGHAN: Yeah, it was. And, you know, you mentioned how bold we were. I just, I really don’t think that any of us at the time realized how bold we were being here, getting this workshop rejected and then saying, you know, no, we think this is important. We’re going to do it anyway. On our own. As grad students.

So I’ve already talked a little bit about some of the spaces that I was in throughout my career where there just weren’t a lot of women around in the room with me. How had you experienced a lack of community or network of women in machine learning before the founding of WiML? And, you know, why do you think it’s important to have that kind of community?

WALLACH: So I felt it in a number of different ways. I think I mentioned a few minutes ago that, like, it was my third time at NeurIPS but my first time sharing a hotel room with another woman. But there were many places over the years where I’d felt this.

So first, as an undergraduate. Then, I did a lot of free and open-source software development, and I was pretty involved in stuff to do with the Debian Linux distribution. And back then, the percentage of women involved in free and open-source software development was about 1 percent, 1.5% (opens in new tab), and the percentage involved actually in Debian was even less than that. So that had led me and some others to start this Debian Women Project (opens in new tab). And then, again, of course, I faced this in machine learning.

I just didn’t know that many other women in machine learning. I didn’t … there weren’t a large number of senior women, for example, to look up to as role models. There weren’t a large number of female PhD students. And this, kind of, made me sad because I was really excited about machine learning, and I hoped to spend my entire career in it. But because I didn’t see so many other women around, particularly more senior women, that really made me question whether that would even be possible, and I just didn’t know.

Um, I think, you know, thinking about this, and I’ve obviously reflected on this a lot over the years, but I think having a diverse community in any area, be it free and open-source software development, be it machine learning, any of these kinds of things, is just so important for so many reasons. And some of those reasons are little things like finding people that you would feel comfortable sharing a hotel room with.

But many of these things are bigger things that can then have, like, even, kind of, knock-on cumulative effects. Like feeling valued in the community, feeling welcome in the community, having role models, being able to, sort of, see people and say, “Oh, I want to be kind of like that person when I grow up; I could do this.” And then even just representation of different perspectives in the work itself is so important.

The flip side of that is that there are a whole bunch of things that can go wrong if you don’t have a diverse community. You can end up with gatekeeping, with toxic or unsafe cultures, obviously attrition as people just leave these kinds of spaces because they feel that they’re not welcome there and won’t be valued there. And then to that point of having representation of different perspectives, with a really homogeneous community, you can end up with, kind of, blind spots around the technology itself, which can then lead to harms.

WORTMAN VAUGHAN: 100%. So did you ever imagine during all of this that WiML would still be around 20 years later and we would be sitting here on a podcast talking about this?

WALLACH: [LAUGHS] No, absolutely not. I didn’t even think that WiML would necessarily be around for a second year. I thought it was probably going to be, like, a one-off event. And I certainly don’t think that I thought that I would still be involved in the machine learning community 20 years later, as well. So very unexpected.

I’ve got a question for you, though. What do you remember most about that first workshop?

WORTMAN VAUGHAN: I remember a lot of things. I remember that, you know, when we were planning this, we always really wanted the focus to be the research. And, you know, if you think back to what this first workshop looked like, it was a lot of us just giving talks or presenting posters about our own research to other people.

And, you know, I remember thinking at the poster session, like, the vibe was just so much different and better, healthier really than other poster sessions I had been to. Everyone was so supportive and encouraging, but it really was all about the research. I also remember being blown away just walking into that conference room in the morning and seeing all of these women gathered in one place and knowing that somehow, we had actually made this happen.

Um, I remember we also faced some challenges with the workshop early on. What are the challenges that stand out to you most?

WALLACH: Yeah, so a lot of people really got it, right. And they were super supportive. So, for example, folks at Penn totally got it, and they actually funded a bunch of that first workshop. But others in the community didn’t get it and didn’t see the point, didn’t see why it was necessary.

I remember having dinner with one machine learning researcher and him telling me that he didn’t think this kind of workshop was necessary because women’s experiences were no different to men’s experiences. And then later on in the conversation, he talked about—like, you know, this is, like, an hour and a half later or something—he talked about how he and a friend of his had gone to the bar at an all-women’s college and he’d felt so awkward and out of place. And I ended up pointing out to him [LAUGHS] that he just, kind of, explained to himself why we needed WiML. So, yeah, there were some people who didn’t get it, and it took a lot of, sort of, talking to people and, kind of, explaining.

WORTMAN VAUGHAN: Yep.

WALLACH: Another challenge was figuring out how to fund it in an ongoing manner once we decided that we wanted to do this more than once.

So as I said, Penn funded a lot of that first workshop, but that wasn’t a sustainable model, and it wasn’t going to be realistic for Penn to keep funding it. So in the end, we worked with Amy Greenwald to obtain a National Science Foundation grant that would cover a lot of costs, and we also received donations from other organizations.

Um, a third challenge was figuring out where to hold the workshop given that we did want that focus to be on research. So the first two times, we held the workshop at the Grace Hopper conference, but we started to feel that that wasn’t really the right venue given that we wanted that focus to be on research. So we ended up moving it to NeurIPS, and this had a bunch of benefits, some of which I don’t think we’d even fully thought through when we made that decision.

So one of the benefits was that attendees’ WiML travel funding—so we would give them this travel funding to enable them to pay the cost of attending WiML, stay in hotel rooms, all this kind of stuff—this would actually enable them to attend NeurIPS, as well, if we co-located with NeurIPS.

WORTMAN VAUGHAN: Yep.

WALLACH: Another main benefit was that we held WiML on the day before NeurIPS. So then throughout the rest of the conference, WiML attendees would see familiar faces throughout the crowd and wouldn’t necessarily feel so alone.

WORTMAN VAUGHAN: So you’re talking about these challenges. How have these challenges changed over time? Or, you know, more broadly, can you talk about how the workshop and Women in Machine Learning as an organization as a whole, kind of, evolved over the years? I know that you served a term as the WiML president.

WALLACH: Yeah. So it’s changed a lot. So first, obviously, most importantly, it evolved from being, kind of, this one-off event where we were just seeing what would happen to being really a robust organization. And the first step in that was creating the WiML board. And, as you just said, I served as the first president of that.

But there have been a bunch of other steps since then. And one of the things I want to flag about the WiML board was that this was really important because the board members could focus on the long-term health of the organization and these, sort of, like, you know, things that spanned multiple years, like how to get sustainable funding sources, this kind of thing, versus the actual workshop organizers, who would focus on things like running the call for submissions and stuff like that. And being able to separate those roles made it really just reduce the burden on the workshop organizers meant that we could take this, kind of, longer-term perspective.

Another really important step was becoming, officially becoming a non-profit. So that happened a few years ago. And again, it was the natural thing to do at that point in time and just another step towards creating this, sort of, durable, robust organization.

But it’s really taken on a life of its own. I’m honestly not super actively involved nowadays, which I think is fantastic. The organization doesn’t need me. That’s great. It’s also wild to me that because it’s been around for 20 years at this point that there are women in the field who don’t know what it’s like to not have WiML.

So a bunch of other affinity groups got created. So Timnit Gebru cofounded Black in AI when she was actually a postdoc at Microsoft Research New York City. So you and I got to actually see the founding of that affinity group up close. And then now there are a ton of other affinity groups. So there’s LatinX in AI (opens in new tab); there’s Queer in AI (opens in new tab), Muslims in ML (opens in new tab), Indigenous in AI and ML (opens in new tab), New In ML (opens in new tab), just to name a few.

WORTMAN VAUGHAN: Yeah, and all of these are growing, too, every year.

You know, this year, WiML had over 400 submissions. They accepted 250 to be presented. It’s amazing.

WALLACH: That’s wild.

WORTMAN VAUGHAN: Yeah, yep. And there’s going to be a WiML presence this year actually at all three of the NeurIPS venues. So there’s going to be a presence in Mexico City, in Copenhagen, and, of course, in San Diego for the main workshop. So it’s pretty great.

And, you know, on top of that, I think the organization now, as you were saying, is able to do so much more than just the workshop alone. So for instance, WiML now runs this worldwide mentorship program for women and nonbinary individuals in machine learning, where they’re matched with a mentor and they can participate in these one-to-one mentoring meetings and seminars and panel discussions, which happens all throughout the year. I think they have about 50 mentors signing up each year, but I’m sure they could always use more. Um, so it’s just really amazing to look back and see how much the WiML community has done and how much it’s grown.

And, you know, on the one hand, I think that honestly, like, founding WiML was one of the things that I’ve done over the course of my career, if not the thing, that I am most proud of …

WALLACH: Oh yeah, me, too.

WORTMAN VAUGHAN: … to this day, but at the same time, like, we can’t take credit for any of this. It’s, like, a community effort.

WALLACH: No.

WORTMAN VAUGHAN: It’s the community that has really kept us going …

WALLACH: Yes.

WORTMAN VAUGHAN: … for the last 20 years,

WALLACH: Yes.

WORTMAN VAUGHAN: … so it’s great. Going to stop gushing now, but it’s amazing.

WALLACH: And it’s not just WiML that’s changed over the years. The entire industry has changed a ton, as well.

How has your research evolved as a result of these changes to the entire field of AI and machine learning and also from your own change from academia to industry?

WORTMAN VAUGHAN: It’s a great question. You know, we’ve touched on this a little bit, but our research paths really evolved differently but ended up in these very similar places where we’re working on responsible AI, we’re advocating for interdisciplinary approaches, incorporating techniques from HCI, and so on. And I think that part of this was because of shifts of the community and also what’s happening in industry. Working in responsible AI in industry, there’s definitely not ever a shortage of interesting problems to solve, right.

And I think that for both of us, our research interests in recent years really have been driven by these really practical challenges that we’re seeing. We were both involved early on in defining what responsible AI means within Microsoft, shaping our internal Responsible AI Standard (opens in new tab). I led this internal companywide working group on AI transparency, which was focused both on model interpretability like we were talking about earlier but also other forms of transparency like data sheets for datasets and the transparency notes that Microsoft now releases with all of our products. And at the same time, you are leading this internal working group on fairness.

WALLACH: Yeah, taking on that internal working group was, kind of, a big transition point in my career. You know, when I joined Microsoft, I was focusing on computational social science and I was also entirely doing research and wasn’t really that involved in stuff in the rest of the company.

Then at the end of my first year at Microsoft, I attended the first Fairness, Accountability, and Transparency in Machine Learning workshop (opens in new tab), which was co-located with NeurIPS. It was one of the NeurIPS workshops. And I got really excited about that and thought, great, I’m going to spend like 20% of my time, maybe one day a week, doing research on topics in the space of fairness and accountability and transparency. Um, that is not what ended up happening.

Over the next couple of years, I ended up doing more and more research on responsible AI, you know, as you said, on topics to do with fairness, to do with interpretability. And then in early 2018, I was asked to co-chair this internal working group on fairness, and that was the point where I started getting much more involved in responsible AI stuff across Microsoft, so outside of just Microsoft Research.

And this was really exciting to me because responsible AI was so new, which meant that research had a really big role to play. It wasn’t like this was kind of an established area where folks in engineering and policy knew exactly what they were doing. And so that meant that I got to branch out from this very, sort of, research-focused work into much more applied work in collaboration with folks from policy, from engineering, and so on.

Now, in fact, as well as being a researcher, I actually run a small applied science team, the Sociotechnical Alignment Center, or STAC for short, within Microsoft Research that focuses specifically on bridging research and practice in responsible AI.

WORTMAN VAUGHAN: Yeah. Do you think that your involvement in WiML has played a role in this work?

WALLACH: Yes, definitely. [LAUGHS] Yeah, without a doubt. So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

So there’s been this, sort of, you know, focus on marginalized groups, particularly women, in the context of machine learning and with my WiML, kind of, work and then in my research work thinking about fairness, as well.

The other way that WiML has really, sort of, affected what I do is that I work with a much more varied group of people nowadays than I did back when I was just focusing on, kind of, machine learning, computational social science, and stuff like that. And many of my collaborators are people that I’ve met through WiML over the years.

WORTMAN VAUGHAN: And, of course, there has been another big shift within industry recently, which is just all the excitement around generative AI. Can you say a bit about how that has changed your research?

WALLACH: OK, yeah. So this is another big one. There are so many ways that this changed my work. One of the biggest ways, though, is that generative AI systems are now everywhere. They’re being used all over the place for all kinds of things. And, you know, you see all these news headlines about GenAI systems, you know, diagnosing illnesses, solving math problems, and writing code, stuff like that. And also headlines about various different risks that can occur when you’re using generative AI. So fabricating facts, memorizing copyrighted data, generating harmful content, you know, these kinds of things. And with all this attention, it’s really natural to ask, what is the evidence behind these claims? So where is this evidence coming from, and should we trust it?

It turns out that much of the evidence comes from GenAI evaluations that involve measuring the capabilities, the behaviors, and the impacts of GenAI systems, but the current evaluation practices that are often used in the space don’t really have as much scientific rigor as we would like, and that’s, kind of, a problem.

So one of the biggest challenges is that the concepts of interest when people are, sort of, doing these GenAI evaluations—so things like diagnostic ability, memorization, harmful content, concepts like that—are much more abstract than the concepts like prediction accuracy that underpinned machine learning evaluations before the generative AI era.

And when we look at these new concepts that we need to be able to focus on in order to evaluate GenAI systems, we see that they’re actually much more reminiscent of these abstract contested concepts—these, kind of, fuzzy, squishy concepts—that are studied in the social sciences. So things like democracy and political science or personality traits and psychometrics. So there’s really that, sort of, connection there to these, kind of, squishier things.

So when I was focusing primarily on computational social science, most of my work was focused on developing machine learning methods to help social scientists measure abstract contested concepts. So then when GenAI started to be a big thing and I saw all of these evaluative claims involving measurements of abstract concepts, it seemed super clear to me that if we were going to actually be able to make meaningful claims about what AI can do and can’t do, we’re going to need to take a different approach to GenAI evaluation.

And so I ended up, sort of, drawing on my computational social science work around measurement and I started advocating for adopting a variant of the framework that social scientists use for measuring abstract contested concepts. And my reason for doing this was that I believed—I still believe—that this is an important way to improve the scientific rigor of GenAI evaluations.

You know all of this, of course, because you and I, along with a bunch of other collaborators at Microsoft Research and Stanford and the University of Michigan published a position paper on this framework entitled “Evaluating GenAI Systems is a Social Science Measurement Challenge” at ICML [International Conference on Machine Learning] this past summer.

What are you excited about at the moment?

WORTMAN VAUGHAN: Yeah, so lately, I have been spending a lot of time thinking about AI and critical thought: how can we design AI systems to support appropriate reliance, preserve human agency, and really encourage critical engagement on the part of the human, right?

So this is an area where I think we actually have a huge opportunity, but there are also huge risks. If I think about my most optimistic possible vision of the future of AI —which is not something that is easy for me to do, as I’m not a natural optimist, as you know—it would be a future in which AI helps people grow and flourish, in which it, kind of, enriches our own human capabilities. It deepens our own human thinking and safeguards our own agency.

So in this future, you know, we could build AI systems that actually help us brainstorm and learn new knowledge and skills, both in formal educational settings and in our day-to-day work, as well. But I think we’re not going to achieve this future by default. It’s something that we really need to design for if we want to get there.

WALLACH: You mentioned that there are risks. What are the risks that you can see here?

WORTMAN VAUGHAN: Yeah, there’s so much at stake here. You know, in the short term, there are things like overreliance—depending on the output of an AI system even when the system’s wrong. This is something that I’ve worked on a bunch myself. There’s a risk of loss of agency or the ability to make and execute independent decisions and to ensure that our outcomes of AI systems are aligned with personal or professional values of the humans who are using those systems. This is something that I’ve been looking at recently in the context of AI tools for journalism (opens in new tab). There’s diminished innovation, by which I mean a loss of creativity or diversity of ideas.

You know, longer term, we risk atrophied skills—people just losing or simply never developing helpful skills for their career or their life because of prolonged use of AI systems. The famous example that people often bring up here is pilots losing the ability to perform certain actions in flight because of dependence on autopilot systems. And I think we’re already starting to see the same sort of thing happen across all sorts of fields because of AI.

And, you know, finally, another risk that I’ll mention that seems to resonate with a lot of folks I talk to is what I would just call loss of joy, right. What happens when we are delegating to AI systems the parts of our activities that we really take pleasure and find this satisfaction in doing ourselves.

WALLACH: So then as a community, what should we be doing if we’re worried about these risks?

WORTMAN VAUGHAN: Yeah, I mean, I think this is going to have to be a big community effort if we want to achieve this. This is a big goal. But there are a few places I think we especially need work.

So I think we need generalized principles and practices for AI system builders for how they can build AI systems in ways that promote human agency and encourage critical thought. We also need principles and practices for system users. So how do we teach the general population to use AI in ways that amplify their skills and capabilities and help them learn new things?

And then, you know, close to your heart, I’m sure, I think that we need more work on measurement and evaluation, right. We are once again back to these squishy human properties.

You know, I mentioned I’ve done some work on overreliance in generative AI systems, and I started there because on the grand scale of risks here, overreliance is something that is relatively easy to measure, at least in the short term. But how do we start thinking about measuring people’s critical thinking when using AI across all sorts of contexts and at scale and over long-time horizons, right? How do we measure the, sort of, longitudinal effect of AI systems just on our critical thought as a population?

And by the way, if anyone listening is going to be at the WiML workshop, I’ll actually be giving a keynote on this topic. And this is something I’m just incredibly excited about because first, I’m incredibly excited about this topic, but also, in the whole 20 years of WiML, I’ve given opening remarks and similar several times, but this is actually the very first time that I will be talking about my own research there. So this is like my dream. I’m thrilled that this is happening.

WALLACH: That’s awesome. Oh, that’s so exciting. Excellent.

So one last question for you. If you could go back and talk to yourself 20 years ago and give yourself some advice, what would you say?

WORTMAN VAUGHAN: Yeah, OK, I’ve thought about this one a bit over the past week, and there are three things here I want to mention.

So first, I would tell myself to be brave about speaking up. You know I’m about as introverted as it gets and I’m naturally very shy, and this has always held me back. It still holds me back now. It was really embarrassingly late in my career that I decided to do something about this and start to develop strategies to help myself speak up more. And eventually, it started to grow into something that’s a little bit more natural.

WALLACH: What kind of, um, what kind of strategies?

WORTMAN VAUGHAN: Yeah, so you know, one example is I use a lot of notes. For this podcast, I have a lot of notes here. I’m a big notes person, and things like that really help me.

The second thing that I would tell myself is to, you know, work on the problems that you really want to see solved. As researchers, we have this amazing freedom to choose our own direction. And early on, you know, a lot of the problems that I worked on were problems that I really enjoyed thinking about on a day-to-day basis. It was a lot of fun. They were like little math puzzles to me. But I often found that, you know, when I would be at conferences and people would ask me about my work, I didn’t really want to talk about these problems. I just in some sense, you know, I had fun doing it, but I didn’t really care. I wasn’t passionate about it. I didn’t care that I had solved the problem.

And so once, many years ago now, when I was thinking about my research agenda, I got some good advice from our former lab director, Jennifer Chayes, who suggested that I go through my recent projects and sort them into projects where I really liked working on them—it was a fun experience day-to-day—and projects that I liked talking about after the fact and, kind of, felt good about the results and then see where the overlap is. And this is something that, like, it kind of sounds, kind of, obvious when I say it now, but at the time, it was really eye-opening for me.

WALLACH: That’s so cool. And now I, kind of, want to do that with all of my projects, particularly at the moment. I actually just took five months, as you know, five months off of work for parental leave because I just had a baby. And so I’m, sort of, taking a big, kind of, inventory of everything as I get back into all of this now, and I love this idea. I think this is really cool.

WORTMAN VAUGHAN: It’s changed really my whole approach to research. Like, you know, we were talking about this, but most of the work I do now is more HCI than machine learning because I found that the problems that really motivate me, that I want to be talking to people about at conferences, are the people problems.

The third piece of advice I would give myself is that you should bring more people into your work, right.

So there’s this kind of vision on the outside of research being this solo endeavor, and it can feel so competitive at times, right. We all feel this. But time and time again, I’ve seen that the best research comes from collaborations and from bringing people together with diverse perspectives who can challenge each other in a way that is respectful but makes the work better.

Is there advice that you would give to your former self of 20 years ago?

WALLACH: Yeah. OK. So I’ve also been thinking about this a bunch over the past week. There’s actually a lot of advice I think I would give my former self, [LAUGHS] but there are three things that I keep coming back to.

OK, so first—and this is similar to your second point—push for doing the work that you find to be most fulfilling even if that means taking a nontraditional path. So in my case, I’ve always been interested in the social sciences. Back when I was a student, you know, even when I was a PhD student, doing research that combined computer science and the social sciences just wasn’t really a thing. And so as a result, it would have been really easy for me to just be like, “Oh well, I guess that isn’t possible. I’ll just focus on traditional computer science problems.”

But that’s not what I ended up doing. Instead, and often in ways that made my career, kind of, harder than it probably would have been otherwise, I ended up pushing. I kept pushing, and in fact, I keep pushing, even nowadays, to bring these things together—computer science and the social sciences—in an interdisciplinary fashion. And this hasn’t been easy. But cumulatively, the effect has been that I’ve been able to do much more impactful work than I think I would have been able to do otherwise, and the work I’ve done, I’ve just enjoyed so much more than would otherwise have been the case.

OK, so second, be brave and share your work. So this is actually advice for my current self and my former self, as this is something that I definitely still struggle with.

WORTMAN VAUGHAN: As do I, you know, and actually, I think it’s funny to hear you say this because I would say that you are much better at this than I am.

WALLACH: I still, I think I have a lot of work to do on this one. Yeah, it’s hard. It’s really hard.

As you know, I am a perfectionist, and this is good in some ways, but this is also bad in other ways. And one way in which this is bad is that I tend to be really anxious about sharing and publicizing my work, especially when I feel it’s not perfect.

So as an example, I wrote this massive tutorial on computational social science for ICML in 2015, but I never put the slides … and I wrote a whole script for it … I never put the slides or the script online as a resource for others because I felt it needed more work. And I actually went back and looked at it earlier this year, when we were working on the ICML paper, and I was stunned because it’s great. Why didn’t I put this online? All these things that I thought were problems 10 years ago, no, they’re not a big deal. I should have just shared it.

As another example, STAC, my applied science team, was using LLMs as part of our approach to GenAI evaluation back in 2022, way before the sort of “LLM-as-a-judge” paradigm was widespread. But I was really worried that others would think negatively of us for doing this, so we didn’t share that much about what we were doing, and I regret that because we missed out on an opportunity to kick off an industrywide discussion about this, sort of, LLM-as-a-judge paradigm.

OK, so then my third point is that the social side of research is just as valuable as the technical side. And by this, I’m actually not talking about social science and computer science. I actually mean that the how of doing research, including who you talk to, who you collaborate with, and how you approach those interactions, is just as important as the research itself.

As a PhD student, I felt really bad about spending time socializing with other researchers, especially at conferences, because I thought that I was supposed to be listening to talks, reading papers, and discussing technical topics with researchers and not socializing. But in hindsight, I think that was wrong. Many of those social connections have ended up being incredibly valuable to my research, both because I’ve ended up collaborating with and in some cases even hiring the people who I first got to know socially …

WORTMAN VAUGHAN: Yeah.

WALLACH: … but also because the friendships that I’ve built, like our friendship, for example, have served as a crucial support network over the years, especially when things have felt particularly challenging.

WORTMAN VAUGHAN: Yeah, absolutely. I agree with all of that so much.

And with that, I will say thank you so much for doing this podcast with me today.

WALLACH: Thank you.

WORTMAN VAUGHAN: It was a lot of fun to reflect on the last 20 years of WiML, but also the last 20 years of our careers and friendship and all of this, so it’s great, and I never would have agreed to do this if it had been with anyone but you.

WALLACH: Likewise. [LAUGHS]

So thank you, everybody, for listening to us, and hopefully some of you will join for the 20th annual workshop for Women in Machine Learning (opens in new tab), which is taking place on Dec. 2. And of course, Jenn and I will both be there in person. We’ll also be at NeurIPS afterwards. So feel free to reach out to us if you want to chat with us or to learn more about anything that we covered here today.

[MUSIC]

OUTRO: You’ve been listening to Ideas, a Microsoft Research Podcast. Find more episodes of the podcast at aka.ms/researchpodcast (opens in new tab).

[MUSIC FADES]


[1] Wallach later clarified that the number of registrants for the 2005 Conference on Neural Information Processing Systems was around 900.

Opens in a new tab

The post Ideas: Community building, machine learning, and the future of AI appeared first on Microsoft Research.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AI Toolkit + Copilot - Pt. 3: Create an Agent with Tools

1 Share
From: Microsoft Developer
Duration: 16:30
Views: 36

This video is Part 3 of the AI Toolkit + Copilot video series. This video is part of the Copilot + AI Toolkit Pet Planner workshop. View the repo and instructions: https://aka.ms/AIToolkit/workshop

Join April as she creates an agent in the AI Toolkit’s Agent Builder that’s equipped with a custom local MCP server. The agent can leverage tools from the local MCP server to support its generated output.

Install the AI Toolkit: https://aka.ms/AIToolkit
Setup your Microsoft Foundry project: https://ai.azure.com

Learn More about Microsoft Foundry Model and Tools announcements at https://aka.ms/model-mondays

Join the Discord: https://aka.ms/insideMF/discord
Hop on Forum: https://aka.ms/insideMF/forum

Chapter Markers

00:00 - 00:02 - Introduction
00:03 - 03:45 - Create an agent
03:46 - 08:37 - Create a custom local MCP server
08:38 - 09:56 - Set up environment to run the local MCP server
09:57 - 16:28 - Run the local MCP server and test with the agent

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AI Toolkit + Copilot - Pt. 4: Generate Agent Code

1 Share
From: Microsoft Developer
Duration: 10:07
Views: 3

This video is Part 4 of the AI Toolkit + Copilot video series. This video is part of the Copilot + AI Toolkit Pet Planner workshop. View the repo and instructions: https://aka.ms/AIToolkit/workshop

Join April as she transitions from creating an agent in the AI Toolkit’s Agent Builder to generating the code for the agent. Once the code is generated, April shows you how to set up your local environment to run the agent code. With the agent code, developers can continue refining their agent with additional logic and leverage Copilot for assistance.

Install the AI Toolkit: https://aka.ms/AIToolkit
Setup your Microsoft Foundry project: https://ai.azure.com

Learn More about Microsoft Foundry Model and Tools announcements at https://aka.ms/model-mondays

Join the Discord: https://aka.ms/insideMF/discord
Hop on Forum: https://aka.ms/insideMF/forum

Chapter Markers

00:00 - 00:02 - Introduction
00:03 - 01:20 - Generate the agent code
01:21 - 03:06 - Review the agent code
03:07 - 04:33 - Set up environment to run the agent code
04:34 - 11:36 - Run the agent code

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AI Toolkit + Copilot - Pt. 5: Add Tracing to an Agent

1 Share
From: Microsoft Developer
Duration: 7:41
Views: 8

This video is Part 5 of the AI Toolkit + Copilot video series. This video is part of the Copilot + AI Toolkit Pet Planner workshop. View the repo and instructions: https://aka.ms/AIToolkit/workshop

Join April as she demonstrates how to use Copilot in Agent mode to add tracing to an agent. Copilot leverages AI Toolkit tools to both set up your agent file and start collecting traces. Tracing enables you to view the steps that it took for an agent to generate its output. Once tracing is enabled, developers can view traces in the AI Toolkit’s Tracing viewer.

Install the AI Toolkit: https://aka.ms/AIToolkit
Setup your Microsoft Foundry project: https://ai.azure.com

Learn More about Microsoft Foundry Model and Tools announcements at https://aka.ms/model-mondays

Join the Discord: https://aka.ms/insideMF/discord
Hop on Forum: https://aka.ms/insideMF/forum

Chapter Markers

00:00 - 00:02 - Introduction
00:03 - 01:21 - Recap of current progress
01:22 - 04:09 - Enable tracing with Copilot
04:10 - 04:34 - Run the agent to collect traces
04:35 - 07:39 - Review traces in Tracing viewer

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AI Toolkit + Copilot - Pt. 6: Evaluate Agent Output

1 Share
From: Microsoft Developer
Duration: 22:44
Views: 4

This video is Part 6 of the AI Toolkit + Copilot video series. This video is part of the Copilot + AI Toolkit Pet Planner workshop. View the repo and instructions: https://aka.ms/AIToolkit/workshop

Join April as she demonstrates how to use Copilot in Agent mode to prepare for evaluating an agent’s output. Copilot leverages AI Toolkit tools to help developers choose evaluators, create a dataset, and create an evaluation script to evaluate agent output.

Install the AI Toolkit: https://aka.ms/AIToolkit
Setup your Microsoft Foundry project: https://ai.azure.com

Learn More about Microsoft Foundry Model and Tools announcements at https://aka.ms/model-mondays

Join the Discord: https://aka.ms/insideMF/discord
Hop on Forum: https://aka.ms/insideMF/forum

Chapter Markers

00:00 - 00:02 - Introduction
00:03 - 01:19 - Recap of current progress
01:20 - 02:57 - Choose evaluators with Copilot
02:58 - 07:00 - Create a dataset with Copilot
07:01 - 16:50 - Review evaluation plan and create evaluation script
16:51 - 18:50 - Review evaluation output
18:51 - 22:41 - Use Copilot to create an evaluation report with recommendations

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories