Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152068 stories
·
33 followers

Trump Administration Begins Refunding $166 Billion In Tariffs

1 Share
"After a Supreme Court of the United States ruling in Feb. 2026, many tariffs imposed by the Trump administration were declared illegal because the president overstepped his authority," writes Slashdot reader hcs_$reboot. "As a result, the U.S. government now has to refund a massive amount of money, around $160-170+ billion, paid mainly by importers." According to the New York Times, the administration has now begun accepting refund requests, "surrendering its prized source of revenue -- plus interest." From the report: For some U.S. businesses, the highly anticipated refunds could be substantial, offering critical if belated financial relief. Tariffs are taxes on imports, so the president's trade policies have served as a great burden for companies that rely on foreign goods. Many have had to choose whether to absorb the duties, cut other costs or pass on the expenses to consumers. By Monday morning, those companies can begin to submit documentation to the government to recover what they paid in illegal tariffs. In a sign of the demand, more than 3,000 businesses, including FedEx and Costco, have already sued the Trump administration in a bid to secure their refunds, with some cases filed even before the Supreme Court's ruling. But only the entities that officially paid the tariffs are eligible to recover that money. That means that the fuller universe of people affected by Mr. Trump's policies -- including millions of Americans who paid higher prices for the products they bought -- are not able to apply for direct relief. The extent to which consumers realize any gain hinges on whether businesses share the proceeds, something that few have publicly committed to do. Some have started to band together in class-action lawsuits in the hopes of receiving a payout. Many business owners said they weren't sure how easy the tariff refund process would be, particularly given Mr. Trump's stated opposition to returning the money. The administration has suggested that it may be months before companies see any money. Adding to the uncertainty, the White House has declined to say if it might still try to return to court in a bid to halt some or all of the refunds. The money will mostly go to importers and companies, since they were the ones that directly paid the tariffs. While individual refunds with interest could take around 60 to 90 days to process, the overall effort will probably move much more slowly because of how large and complicated it will be. There are also legal questions around whether companies would have to pass any of that money on to consumers. Slashdot reader AmiMoJo commented: "This is perhaps the biggest transfer of wealth in American history. Most of those companies will just pocket the refund and not pass any of it on to the consumer. If prices go down at all, they won't be back to pre-tariff levels. You paid the tariffs, but you ain't getting the refund."

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
15 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Changes to GitHub Copilot Individual plans

1 Share

Today we’re making the following changes to GitHub Copilot’s Individual plans to protect the experience for existing customers: pausing new sign-ups, tightening usage limits, and adjusting model availability. We know these changes are disruptive, and we want to be clear about why we’re making them and how they will affect you.

Agentic workflows have fundamentally changed Copilot’s compute demands. Long-running, parallelized sessions now regularly consume far more resources than the original plan structure was built to support. As Copilot’s agentic capabilities have expanded rapidly, agents are doing more work, and more customers are hitting usage limits designed to maintain service reliability. Without further action, service quality degrades for everyone.

We’ve heard your frustrations about usage limits and model availability, and we need to do a better job communicating the guardrails we are adding—here’s what’s changing and why.

  1. New sign-ups for GitHub Copilot Pro, Pro+, and Student plans are paused. Pausing sign-ups allows us to serve existing customers more effectively.
  2. We are tightening usage limits for individual plans. Pro+ plans offer more than 5X the limits of Pro. Users on the Pro plan who need higher limits can upgrade to Pro+. Usage limits are now displayed in VS Code and Copilot CLI to make it easier for you to avoid hitting these limits.
  3. Opus models are no longer available in Pro plans. Opus 4.7 remains available in Pro+ plans. As we announced in our changelog, Opus 4.5 and Opus 4.6 will be removed from Pro+.

These changes are necessary to ensure we can serve existing customers with a predictable experience. If you hit unexpected limits or these changes just don’t work for you, you can cancel your Pro or Pro+ subscription and you will not be charged for April usage. Please reach out to GitHub support between April 20 and May 20 for a refund.

How usage limits work in GitHub Copilot

GitHub Copilot has two usage limits today: session and weekly (7 day) limits. Both limits depend on two distinct factors—token consumption and the model’s multiplier.

The session limits exist primarily to ensure that the service is not overloaded during periods of peak usage. They’re set so most users shouldn’t be impacted. Over time, these limits will be adjusted to balance reliability and demand. If you do encounter a session limit, you must wait until the usage window resets to resume using Copilot.

Weekly limits represent a cap on the total number of tokens a user can consume during the week. We introduced weekly limits recently to control for parallelized, long-trajectory requests that often run for extended periods of time and result in prohibitively high costs.

The weekly limits for each plan are also set so that most users will not be impacted. If you hit a weekly limit and have premium requests remaining, you can continue to use Copilot with Auto model selection. Model choice will be reenabled when the weekly period resets. If you are a Pro user, you can upgrade to Pro+ to increase your weekly limits. Pro+ includes over 5X the limits of Pro.

Usage limits are separate from your premium request entitlements. Premium requests determine which models you can access and how many requests you can make. Usage limits, by contrast, are token-based guardrails that cap how many tokens you can consume within a given time window. You can have premium requests remaining and still hit a usage limit.

Avoiding surprise limits and improving our transparency

Starting today, VS Code and Copilot CLI both display your available usage when you’re approaching a limit. These changes are meant to help you avoid a surprise limit.

Screenshot of a usage limit being hit in VS Code. A message appears that says 'You've used over 75% of your weekly usage limit. Your limit resets on Apr 27 at 8:00 PM.'
Usage limits in VS Code
A screenshot of a usage limit being hit in GitHub Copilot CLI. A message appears that says '! You've used over 75% of your weekly usage limit. Your limit resets on Apr 24 at 3 PM.'
Usage limits in Copilot CLI

If you are approaching a limit, there are a few things you can do to help reduce the chances of hitting it:

  • Use a model with a smaller multiplier for simpler tasks. The larger the multiplier, the faster you will hit the limit.
  • Consider upgrading to Pro+ if you are on a Pro plan to raise your limit by over 5X.
  • Use plan mode (VS Code, Copilot CLI) to improve task efficiency. Plan mode also improves task success.
  • Reduce parallel workflows. Tools such as /fleet will result in higher token consumption and should be used sparingly if you are nearing your limits.

Why we’re doing this

We’ve seen usage intensify for all users as they realize the value of agents and subagents in tackling complex coding problems. These long-running, parallelized workflows can yield great value, but they have also challenged our infrastructure and pricing structure: it’s now common for a handful of requests to incur costs that exceed the plan price! These are our problems to solve. The actions we are taking today enable us to provide the best possible experience for existing users while we develop a more sustainable solution.

The post Changes to GitHub Copilot Individual plans appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

1.0.34

1 Share

2026-04-20

  • Rate limit error message now says "session rate limit" instead of "global rate limit"
Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How do you decide what your AI eval should measure?

1 Share

Imagine you’re a seller using an AI agent to draft outreach emails to prospects. The agent produces an email, but how do you know if it’s good enough to send? And more importantly, how does the team evaluating that agent decide what ‘good’ even means?

AI agents are becoming part of enterprise workflows, and with that, there’s a growing focus on how we evaluate them. But what to measure for the human experience is worth more attention. This is especially challenging because agent outputs are probabilistic; the same agent can produce different results each time, which adds a layer of complexity to evaluation.

It’s easy to default to standard metrics like task completion, accuracy, and hallucination rate. These are useful, but they skip a question: how do you know those are the right things to measure for your agent, for your users?

On our team, we’ve been working through this across multiple agent evaluations, and we’ve landed on a simple framework that helps us get to metrics that reflect what users care about. Here’s the approach, in three steps.

1. User’s job to be done: Understanding what the user is trying to do

It starts with understanding what the user is trying to accomplish. What task are they using this agent for? What does success look like? What are they relying on the agent to get right? And what happens when the agent fails?

Together, these give you a fuller picture of what is worth measuring: what quality looks like when things go well, where the stakes are highest when they don’t, and what users will notice day to day.

This is where UX research plays a critical role. Teams building AI agents have a view of what to measure, but that view is shaped by what they built, not by how people use it. Research surfaces the difference. Understanding user workflows, where the agent fits and where it falls short, reveals evaluation dimensions that teams wouldn’t have identified otherwise.

When you do this, you start to see where the agent can fall short. But not all failures carry the same weight. Some are annoying: the tone is off, or the formatting isn’t quite right. Others can break the experience entirely: wrong data pulled before a client meeting, or a recommendation that doesn’t apply to the user’s situation. Being able to distinguish between the two helps you focus your evaluation on the dimensions that matter most to users.

For example, if a seller is using an AI agent to draft outreach emails, the job to be done is reaching a prospect with something relevant enough to get a response. Success means the email is personalized, references the prospect’s context, and has a clear call to action. On the failure side, a typo is annoying, but an email that sounds generic or misrepresents the prospect’s context carries more weight. Similarly, if a business user is using an agent to research a company’s financials for a procurement decision, success means the right data is surfaced clearly. A slow response is inconvenient, but inaccurate numbers could lead to a bad business decision.

2. Define metrics: Translating what you learned into measurable dimensions

The next step is translating what you’ve learned into measurable dimensions. That’s essentially what a metric is: a specific quality dimension you’re going to evaluate.

Instead of picking metrics from a standard list, we work backward from what we learned about users. What are the specific things that need to go right for this agent to be useful? Those become the metrics. Sometimes they map to familiar labels, sometimes they need new ones. But common metric labels like “accuracy” or “relevance” can mean very different things depending on the agent. Same word, different evaluation criteria entirely.

For example, accuracy for an agent drafting sales emails might mean it used the right company data and didn’t fabricate details. For a finance agent, it might mean the numbers are computationally correct. With relevance, for the sales agent, it might mean the email is tailored to the prospect’s industry and role. For the finance agent, it might mean the financial data surfaced matches the specific business question being asked.

3. Define rubrics: Using rubrics to align on quality

A metric gives you the dimension. A rubric gives you the scoring criteria. It defines what each level means, ideally with concrete descriptions and examples.

For the sales outreach agent, if you’re scoring personalization on a 1 to 5 scale, a 5 might mean the email references the prospect’s role, recent activity, and tailors the value proposition to their situation. A 2 might mean it uses their name but nothing else specific to them. For the finance agent, scoring accuracy on the same scale, a 5 might mean all numbers match the source data and are presented in the right context. A 2 might mean the totals are correct but key details are missing or pulled from the wrong time period.

This part is harder than it sounds. Writing a good rubric takes iteration. It needs to be specific enough to guide scoring, but flexible enough to handle the range of outputs an agent produces. You find edge cases that don’t fit, or dimensions where people keep disagreeing even with the rubric in front of them.

The goal is to get the team on the same page about what quality looks like. Without a shared rubric, differences in scores often come down to differences in interpretation, not differences in the output. A rubric won’t be perfect from the start, but it gives everyone a common reference point to build on.

How this connects to AI evals

In practice, AI evals depend on several things that trace back to human evaluation: the prompts that guide an LLM judge are shaped by the rubrics researchers write, the judge’s scores are calibrated against human ratings to check alignment, and inter-rater agreement on human evaluations is what gives confidence that the criteria are clear enough to automate. If any of those inputs are vague or inconsistent, the AI eval inherits those problems at scale.

In my previous post, I shared how an LLM judge scoring sales outreach emails missed subtleties that human evaluators caught. (How UX Research Shapes AI Evals)

An LLM judge working from loosely defined criteria will produce scores that look precise but don’t mean much. AI evals are only as good as the human judgments they’re built to approximate. We define what quality means from the user’s perspective, and over time, what you learn from running evals feeds back into how you define metrics and rubrics.

Looking ahead

The tools and methods for AI evaluation will keep evolving. But the foundation is UX research: understanding what quality means for your users, translating that into metrics and rubrics, and continuing to refine them as user expectations evolve.

This is still evolving for us. The approach keeps getting refined as we apply it across different agents. If you’re working on something similar, share what you’re learning. That’s how we all get better at this.

Further reading

For a deeper dive into rubric design for AI evaluation, these are easy to read and provides great academic and industry references:


How do you decide what your AI eval should measure? was originally published in UXR @ Microsoft on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
18 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Mastodon says its flagship server was hit by a DDoS attack

1 Share
The DDoS attack against Mastodon's flagship server comes less than a week after Bluesky was targeted with junk web traffic.
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

.NET 11 Previews Focus on Nuts-and-Bolts Coding -- AI Not So Much

1 Share
Remember when you had to really dig in concentrate and understand exactly how C# and other code worked at the most basic levels? Then you'll like Microsoft's early preview of .NET 11.
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories