Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
146179 stories
·
33 followers

Maddy Montaquila: .NET Update - Episode 386

1 Share

https://clearmeasure.com/developers/forums/

Maddy Montaquila is a Senior Product Manager on the Aspire team and has previous been on the MAUI team and has been working with .NET mobile apps since 2018 working on Xamarin tooling. When she first joined Microsoft and worked with the Xamarin team as an intern, she realized the impact that she could have in creating amazing developer tools and frameworks, which inspired her to pursue a role as Program Manager. You can connect with her on Twitter and GitHub @maddymontaquila!

Mentioned in this episode:

Github - Maui
Maddy's Linkedin 
.NET Maui 
Github Maui Samples 
Github - Development Guide 
Episode 244 
Episode 120 

Want to Learn More?

Visit AzureDevOps.Show for show notes and additional episodes.





Download audio: https://traffic.libsyn.com/clean/secure/azuredevops/Episode_386.mp3?dest-id=768873
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

499: Going Full Ralph, CLI, & GitHub Copilot SDK?!?!

1 Share

In episode 499 James and Frank dive into the messy, exciting world of coding agents — from burning through Copilot credits and avoiding merge conflicts to practical workflows for letting agents run tasks while you sleep. They share real tips: break big features into bite-sized tasks, have agents ask clarifying questions, and use Copilot CLI or the new SDK to resolve conflicts, auto-fix lint/build failures, and automate mundane repo work.

The conversation then maps the evolution from simple completions to autonomous loops like Ralph — a structured, repeatable process that generates subtasks, runs until acceptance tests pass, and updates your workflow. If you’re curious how agents, MCPs and SDKs can elevate your dev flow or spark new automations, this episode gives pragmatic examples, trade-offs, and inspiration to start experimenting today.

Follow Us

⭐⭐ Review Us ⭐⭐

Machine transcription available on http://mergeconflict.fm

Support Merge Conflict

Links:





Download audio: https://aphid.fireside.fm/d/1437767933/02d84890-e58d-43eb-ab4c-26bcc8524289/8b7efb14-4670-4ab8-8385-44bc4e7fe967.mp3
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

302 - MCPs Explained - what they are and when to use them

1 Share

MCPs are everywhere, but are they worth the token cost? We break down what Model Context Protocol actually is, how it differs from just using CLIs, the tradeoffs you should know about, and when MCPs actually make sense for your workflow.

Full shownotes at fragmentedpodcast.com/episodes/302.

Show Notes

Tips

Get in touch

We'd love to hear from you. Email is the
best way to reach us or you can check our contact page for other
ways.

We want to hear all the feedback: what's working, what's not, topics you'd like
to hear more on. We want to make the show better for you so let us know!

Co-hosts:

We transitioned from Android development to AI starting with
Ep. #300. Listen to that episode for the full story behind
our new direction.





Download audio: https://cdn.simplecast.com/audio/20f35050-e836-44cd-8f7f-fd13e8cb2e44/episodes/3950092c-154b-4a66-acdf-74983f895165/audio/6ff0e5ac-be30-422b-afd5-f3274136db63/default_tc.mp3?aid=rss_feed&feed=LpAGSLnY
Read the whole story
alvinashcraft
16 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

AI-generated tests as ceremony

1 Share

On epistemological soundness of using LLMs to generate automated tests.

For decades, software development thought leaders have tried to convince the industry that test-driven development (TDD) should be the norm. I think so too. Even so, the majority of developers don't use TDD. If they write tests, they add them after having written production code.

With the rise of large language models (LLMs, so-called AI) many developers see new opportunities: Let LLMs write the tests.

Is this a good idea?

After having thought about this for some time, I've come to the interim conclusion that it seems to be missing the point. It's tests as ceremony, rather than tests as an application of the scientific method.

How do you know that LLM-generated code works? #

People who are enthusiastic about using LLMs for programming often emphasise the the amount of code they can produce. It's striking so quickly the industry forgets that lines of code isn't a measure of productivity. We already had trouble with the amount of code that existed back when humans wrote it. Why do we think that accelerating this process is going to be an improvement?

When people wax lyrical about all the code that LLMs generated, I usually ask: How do you know that it works? To which the most common answer seems to be: I looked at the code, and it's fine.

This is where the discussion becomes difficult, because it's hard to respond to this claim without risking offending people. For what it's worth, I've personally looked at much code and deemed it correct, only to later discover that it contained defects. How do people think that bugs make it past code review and into production?

It's as if some variant of Gell-Mann amnesia is at work. Whenever a bug makes it into production, you acknowledge that it 'slipped past' vigilant efforts of quality assurance, but as soon as you've fixed the problem, you go back to believing that code-reading can prevent defects.

To be clear, I'm a big proponent of code reviews. To the degree that any science is done in this field, research indicates that it's one of the better ways of catching bugs early. My own experience supports this to a degree, but an effective code review is a concentrated effort. It's not a cursory scan over dozens of code files, followed by LGTM.

The world isn't black or white. There are stories of LLMs producing near-ready forms-over-data applications. Granted, this type of code is often repetitive, but uncomplicated. It's conceivable that if the code looks reasonable and smoke tests indicate that the application works, it most likely does. Furthermore, not all software is born equal. In some systems, errors are catastrophic, whereas in others, they're merely inconveniences.

There's little doubt that LLM-generated software is part of our future. This, in itself, may or may not be fine. We still need, however, to figure out how that impacts development processes. What does it mean, for example, related to software testing?

Using LLMs to generate tests #

Since automated tests, such as unit tests, are written in a programming language, the practice of automated testing has always been burdened with the obvious question: If we write code to test code, how do we know that the test code works? Who watches the watchmen? Is it going to be turtles all the way down?

The answer, as argued in Epistemology of software, is that seeing a test fail is an example of the scientific method. It corroborates the (often unstated, implied) hypothesis that a new test, of a feature not yet implemented, should fail, thereby demonstrating the need for adding code to the System Under Test (SUT). This doesn't prove that the test is correct, but increases our rational belief that it is.

When using LLMs to generate tests for existing code, you skip this step. How do you know, then, that the generated test code is correct? That all tests pass is hardly a useful criterion. Looking at the test code may catch obvious errors, but again: Those people who already view automated tests as a chore to be done with aren't likely to perform a thorough code reading. And even a proper review may fail to unearth problems, such as tautological assertions.

Rather, using LLMs to generate tests may lull you into a false sense of security. After all, now you have tests.

What is missing from this process is an understanding of why tests work in the first place. Tests work best when you have seen them fail.

Toward epistemological soundness #

Is there a way to take advantage of LLMs when writing tests? This is clearly a field where we have yet to discover better practices. Until then, here are a few ideas.

When writing tests after production code, you can still apply empirical Characterization Testing. In this process, you deliberately temporarily sabotage the SUT to see a test fail, and then revert that change. When using LLM-generated tests, you can still do this.

Obviously, this requires more work, and takes more time, than 'just' asking an LLM to generate tests, run them, and check them in, but it would put you on epistemologically safer ground.

Another option is to ask LLMs to follow TDD. On what's left of technical social media, I see occasional noises indicating that people are doing this. Again, however, I think the devil is in the details. What is the actual process when asking an LLM to follow TDD?

Do you ask the LLM to write a test, then review the test, run it, and see it fail? Then stage the code changes? Then ask the LLM to pass the test? Then verify that the LLM did not change the test while passing it? Review the additional code change? Commit and repeat? If so, this sounds epistemologically sound.

If, on the other hand, you let it go in a fast loop where the only observations your human brain can keep up with is that test status oscillates between red and green, then you're back to where we started: This is essentially ex-post tests with extra ceremony.

Cargo-cult testing #

These days, most programmers have heard about cargo-cult programming, where coders perform ceremonies hoping for favourable outcomes, confusing cause and effect.

Having LLMs write unit tests strikes me as a process with little epistemological content. Imagine, for the sake of argument, that the LLM never produces code in a high-level programming language. Instead, it goes straight to machine code. Assuming that you don't read machine code, how much would you trust the generated system? Would you trust it more if you asked the LLM to write tests? What does a test program even indicate? You may be given a program that ostensibly tests the system, but how do you know that it isn't a simulation? A program that only looks as though it runs tests, but is, in fact, unrelated to the actual system?

You may find that a contrived thought experiment, but this is effectively the definition of vibe coding. You don't inspect the generated code, so the language becomes functionally irrelevant.

Without human engagement, tests strike me as mere ceremony.

Ways forward #

It would be naive of me to believe that programmers stop using LLMs to generate code, including unit tests. Are there techniques we can apply to put software development back on more solid footing?

As always when new technology enters the picture, we've yet to discover efficient practices. Meanwhile, we may attempt to apply the knowledge and experience we have from the old ways of doing things.

I've already outlined a few technique to keep you on good epistemological footing, but I surmise that people who already find writing tests a chore aren't going to take the time to systematically apply the techniques for empirical Characterization Testing.

Another option is to turn the tables. Instead of writing production code and asking LLMs to write tests, why not write tests, and ask LLMs to implement the SUT? This would entail a mostly black-box approach to TDD, but still seems scientific to me.

For some reason I've never understood, however, most people dislike writing tests, so this is probably unrealistic, too. As a supplement, then, we should explore ways to critique tests.

Conclusion #

It may seem alluring to let LLMs relieve you of the burden it is to write automated tests. If, however, you don't engage with the tests it generates, you can't tell what guarantees they give. If so, what benefits do the tests provide? Do automated testing become mere ceremony, intended to give you a nice warm feeling with little real protection?

I think that there are ways around this problem, some of which are already in view, but some of which we have probably yet to discover.


This blog is totally free, but if you like it, please consider supporting it.
Read the whole story
alvinashcraft
40 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Aspire 13.1 Brings MCP Integration, CLI Enhancements, and Azure Deployment Updates

1 Share

Aspire 13.1 has been released as an incremental update that builds on the polyglot platform foundation introduced with Aspire 13. The release focuses on improving developer productivity through enhancements to the command-line interface, deeper support for AI-assisted development workflows, refinements to the dashboard experience, and clearer deployment behavior for Azure-based environments.

By Almir Vuk
Read the whole story
alvinashcraft
46 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The advent of a lazy software engineer

1 Share

We all heard about the new laziness that crawls up on us. It is all about spawning as many coding agents as possible, often swarming them up, enjoying the ride of being “just a reviewer”. Even though it’s kind of lazy, I don’t think it’s lazy enough.

What is this new laziness like?

We live in a truly new era. Every time I have a few tasks being worked on by Codex in the cloud, I am truly in awe. So far, my PR (personal record, not the pull request!) was juggling a maximum of 6 at the same time, asking for minor or major adjustments. It still sometimes feels like a micromanagement, do this, do that, don’t you dare to xyz, but still. One person can choreograph so many things nowadays!

And don’t get me started with fancy tooling like MCPs or skill files! The sole fact that there are agents out there doing this work for you. How marvelous it is. But it also brings the fundamental question of any knowledge worker:

Are you working on the most important thing?

Now, this can be extended to:

Are your agents working on the most important thing?

Should they burn through tokens to build up things, almost from scratch? Or could we use their token budgets in a better way?

It’s about…

The whole point about coding agents being able to code things… Now, we said it. Is it about coding things or maybe, it’s about building things?

Here, I just said it. Building. Not coding a piece of software, not drafting a database schema, not making schema management easier to maintain. It’s about building value. And one thing that is required to build, is gluing pieces together.

Now, let’s count how many times you wrote a code that was taking some data from a database and pushed it elsewhere. Or performed slight alterations and stored them back. Or accepted an influx of data to store them in a database. It’s ok, if you’re getting paid for the gluing, but are you? Is this repeatable work, that can be LLMed away nowadays, worth spending your time on? The maintenance of it will be a cost anyway.

I’d ask further, is it worth being LLMed away over and over again? I don’t think this is true, especially if we consider that something has to run it! No matter how serverless your solution is, no matter how well scalable your K8 pods are, if you need to run it, you need to run it. And it’s better not to have this obligation. So…

It’s time to delegate

I think the times where a piece of infrastructure can stand on its own are over. The ability to delegate a piece of the gluing inward, to a part of the solution that you buy, not build, is a must have. I think that you and your coding agents should become terribly lazy. You should push and delegate not only the coding part to the agents, not only the gluing to the infrastructure provided by 3rd parties, but even further. What if the infrastructure can perform some work on your behalf?

And let’s reiterate again. If you spend your CPU cycles on running a serverless function or “just a docker image” somewhere to react to every single event, and you don’t delegate things to the infrastructure, I think it’s time to re-evaluate the infrastructure and your choices.

Examples?

Azure Event Grid

Are you thinking about reacting to blobs being created or deleted? Do you need to scan for changes (like it’s the 20th century or something)? You can be notified of things happening via Azure Event Grid. Blob Storage has its own set of events ready to be raised for you. You need to consume them, this is for sure, but the publishing part is done for you!

DynamoDB Streams

Let’s assume that you build on AWS DynamoDB and you want to react to some data changes. You configure an AWS Lambda function to be triggered by the stream of changes. The platform itself becomes the glue.

RavenDB ETL for Kafka

Some documents from RavenDB should have their parts streamed through to Kafka? Extract an identifier plus some data. The gluing is there provided as a Kafka ETL task. Even in cloud instances. You delegate it. You don’t build and maintain it.

The box

I think every single solution that is built could use some help from a box. No matter who builds it, they need to become lazy to use a part of a solution that is capable of understanding a bit more and reaching a bit further. Something that is capable of publishing information, reaching out for data, and then, providing the gluing on its own. It should be able to store some procedures to be performed on behalf of the user who configured it (or an agent, if this is the case) as well. It’s building time, not gluing time. Leave the gluing and the expert knowledge required to make it work for the box and enjoy the full E2E experience. Out of the box!



Read the whole story
alvinashcraft
54 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories