Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
137106 stories
·
31 followers

OpenAI Accused of Training GPT-4o on Unlicensed O'Reilly Books

1 Share
A new paper [PDF] from the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled O'Reilly Media books without a licensing agreement. The nonprofit organization, co-founded by O'Reilly Media CEO Tim O'Reilly himself, used a method called DE-COP to detect copyrighted content in language model training data. Researchers analyzed 13,962 paragraph excerpts from 34 O'Reilly books, finding that GPT-4o "recognized" significantly more paywalled content than older models like GPT-3.5 Turbo. The technique, also known as a "membership inference attack," tests whether a model can reliably distinguish human-authored texts from paraphrased versions. "GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date," wrote the co-authors, which include O'Reilly, economist Ilan Strauss, and AI researcher Sruly Rosenblat.

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Vanishing Culture: Cultural Preservation and Queer History

1 Share

The following guest post from artist and writer Brooke Palmieri is part of our Vanishing Culture series, highlighting the power and importance of preservation in our digital age. Read more essays online or download the full report now.

As a writer and artist that draws on the long history of gender nonconformity in my work, a driving force behind my practice is the idea that a longing for history will always be a fundamental aspect of humanity, so long as memory itself serves as a foundation for human consciousness. Everyone has a history, but the majority of people are not taught how to look back in order to find it. One problem is the depth and breadth of our losses. People and their prized possessions are destroyed by accident and by design throughout history: armed conflict, invasion, willful destruction, natural disaster, decay. Then there is the fantasy of destruction, a destructive force in its own right, the perception that nothing survives. That fantasy begets a reality of its own: because I don’t go looking for what survives, I don’t find it, or I don’t recognize it when I see it. This is true across subcultures and among historically marginalized or oppressed groups, and for the queer and trans subjects whose histories I am interested in recovering in particular. In the twenty-first century, access to queer and trans history is an accident of birth: knowing someone in your family or neighborhood, living in a place where it isn’t legislated against, going to a school that dares teach it, affording admission into one of the universities that offers classes on it. 

“Most queer publishing from the 1960s onward was issued in small, independent presses that have disappeared.”
Brooke Palmieri
Artist & writer

My research process tends to triangulate between the archive of my own weird and imperfect human experiences and the debris I collect around them, small collections amassed by and for queer and trans people, and larger institutions that also contain relevant material that begs to be recontextualized. Or to make it personal: to write my upcoming book Bargain Witch: Essays in Self Initiation, I used my journals and the Wayback Machine to look at old websites I’d made when I was 14, the archive of the William Way LGBT center in Philadelphia where I grew up, and special collections at major institutions like the Fales Library at NYU, the Digital Transgender Archive at Northwestern University, and the British Library in London. All my adult life I’ve made pilgrimage between the intimate domestic spaces where people preserve their own histories, to local collections set up on shoestring budgets as a labor of love, to the vast, climate-controlled repositories of state and higher education that have more recently begun to preserve our histories, each enhancing what it is possible for me to know, delight in, or mourn, about where I have come from, the forebears by blood and by choice that imbue my life with its many possibilities.

It’s a creative act to find and make sense of my own history, one that requires a leap of faith in order to fill in the silences, erasures, omissions, and genuine mysteries that old books and documents, records and artifacts, represent. A lot is left to the imagination. Much of what survives from the past asks more questions than we can answer. This is true for queer and trans archival traces, as it is for other aspects of humanity that are poorly accounted for in public records, or actively discriminated against through surveillance and omission in equal parts.

Classically, archives are brutal, desolate places to find humanity; they were never meant to record the nuances of flesh and blood existence so much as they originate as a way governments keep track of their resources. It has taken millennia for us to conceive of records as places where humanity might be honored rather than betrayed. This is an epic change: I am in awe of the fact that I live in a time where the heft of documentary history—clay, parchment, paper, and now pixel—is shifting paradigms from records kept by anonymous paid laborers to flatten life into statistics, to records kept by people who dare to name themselves and their subjectivity, who collect something of themselves and their obsessions, for other kindred spirits to find. From archives as places meant to consolidate power, to places containing mess and sprawl, places for heated encounters.

In the past few decades of “living with the internet” these places and encounters have multiplied exponentially, as queer and trans subcultures have relied on message boards, blogs, and personal websites to share information. I personally relied (and still rely on) on reddit, and the classic, Hudson’s FTM Resource Guide (www.ftmguide.org), and TopSurgery.Net to navigate the healthcare system in both the UK and USA in order to access hormones and surgery—part of a much longer tradition of “the Transgender Internet” that Avery Dame-Griff chronicles in his book The Two Revolutions (2023). To say nothing of AOL in the early 2000s, the culture on Tumblr in the early-to-mid 2010s and printed publications like Original Plumbing and archived copies of the FTM Newsletter. Digital environments informed me of physical places, and vice versa, and each expanded and embellished my appreciation of the other. From reading books and trawling the internet, I knew places like San Francisco, New York, London, and Berlin would be where I could find other trans people. When I moved to London, I knew to go to Gay’s the Word, a queer bookshop that first opened in 1979, to make friends, and eventually, to get a job. When I started my own queer book club, or wanted to find zine fairs or club nights, I often found information about them on tumblr or instagram. When traveling to new cities, a gay friend tipped me off that any place recommended by BUTT Magazine would show me a good time.

But in my queer and trans context, both digital and paper-based archives and libraries are often labors of love, made from scratch, published with a “by us for us” ethos that is under-resourced and so always in danger of disappearing. Most queer publishing from the 1960s onward was issued in small, independent presses that have disappeared. An interesting model for documenting this is the British Library’s Endangered Archives Programme, where resources and expertise is shared to catalog and digitize collections materials in a centrally kept database. This is mutually beneficial to the places where the materials are kept—accurate cataloging is crucial to using and developing any archive—as well as to interested audiences further afield. And this feels also like a pragmatic approach to the reality of loss: we might not be able to predict what will survive over time, but keeping abundant records in multiple locations of what has existed will at least allow us to mourn our losses.

Download the complete Vanishing Culture report.

A culmination of my interests in hunting and gathering queer history is my imprint and traveling installation: CAMP BOOKS. I started CAMP BOOKS in 2018 as a way to highlight the places I’d most enjoyed meeting queer and trans people–independent bookshops, which have a rich, radical history throughout subculture–and as a way to keep the focus on making and distributing publications about the obscure histories I was unearthing in my research. Before libraries sought to cater to an LGBTQIA+ readership, specialist bookshops like the Oscar Wilde Memorial Bookshop, Giovanni’s Room, and Gay’s the Word were the only places you could find concentrations of queer, feminist books with positive portrayals of queer lives. These shops were hubs of culture: places where community events were held and publicized, activists groups were able to meet, and friendship and romance could blossom in broad daylight. CAMP BOOKS sets up pop-up bookshops, tables at art and zine fairs, and also builds installations in galleries and community spaces to continue this tradition. I also sell rare books and ephemera related to queer history through CAMP BOOKS in order to fund our efforts, including new publications, zines, and posters related to queer and trans history. The CAMP BOOKS motto is: “Queer Pasts Nourish Queer Futures,” and this extends to our model of generating funds from past efforts to fund new writing and work. I also believe this logic can extend to anyone: preserving what interests you about the past brings a particular pleasure of connection into the present. Most of the people I have loved in my life, I have met and known through shared obsessions with the past, and it has brought a lot of pleasure and adventure into my life.

An abiding concern I have had about cultural preservation—in my case, subcultural preservation, because the people I love across time existed in a myriad of DIY subcultures that often cross-pollinated art, music, and literary influences—and have heard from others, revolves around the question of inheritance and access. You can inherit books and papers, art and artifacts, but you can’t inherit e-books, and born digital archives require specialist care and technology. My hope is that this divide is bridged by reconsidering the nature of inheritance itself: rather than an individual’s gain, queer and trans history is something that we are all heir to, and can all benefit from accessing. Large institutions, and large digital repositories in particular, play a crucial role in rewriting the meaning of inheritance by offering freely accessible, and accurately cataloged, information. But ultimately, archives that document human experience begin at home, and rely on people whose love for their lives, their friends, and their scenes inspire them to save posters, photographs, and other receipts–online and offline–that document their experiences. I hope after reading this, you start saving something of your life now.

About the author

Brooke Palmieri is an artist and writer working at the intersection of memory, history, and gender-bending alternate realities. In 2018, Brooke founded CAMP BOOKS, promoting access to queer history through rare archival materials, cheap zines, and workshops/installations. His book, Bargain Witch, comes out in Fall 2025 by Dopamine Books. You can find out more at http://bspalmieri.com

Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

What’s new with Microsoft in open-source and Kubernetes at KubeCon + CloudNativeCon Europe 2025

1 Share

I am thrilled that the Microsoft Azure team is joining the community again at this year’s KubeCon + CloudNativeCon Europe 2025 in London! We have exciting new enhancements and innovations to share and can’t wait to showcase all the updates in Azure and Azure Kubernetes Service (AKS), as well as our ongoing contributions to the cloud native community.

Join Microsoft Azure at KubeCon Europe 2025

Find us at Booth #N150

I have talked previously on this blog about Microsoft’s commitment to supporting and driving innovation in the cloud native ecosystem through contributions and leadership from engineers across Azure. Since my last update at KubeCon + CloudNativeCon North America 2024, we have continued our investments in growing existing CNCF projects, while also launching new projects to meet the community’s evolving needs.

To that end, we turned to the recent Cloud Native Computing Foundation (CNCF) Ecosystem Gaps report, which highlights security, complexity, and cost management as the top three gap areas in the ecosystem. These areas are where our teams are focusing their efforts to help improve end user experiences.

Enhancing security for Kubernetes environments

In the context of today’s complex ecosystems, security is a fundamental necessity and undeniably a huge area of concern for teams building and running cloud native solutions. Microsoft has made several key contributions to enhancing security for Kubernetes environments:

  • Istio’s ambient mode, now generally available, is a new feature that provides mTLS, traffic management, and observability with lower cost and operational overhead than ever before.
  • Hyperlight (recently accepted into the CNCF Sandbox) is a Rust library for executing small, embedded functions using hypervisor-based protection for each function call at scale.
  • Hyperlight-Wasm enables any programming language compiled to a WebAssembly component to execute in a protected Hyperlight micro-VM using Wasmtime.
  • Ratify is a verification engine as a binary executable and on Kubernetes, which enables verification of artifact security metadata and admits for deployment only those that comply with policies you create. We recently added capabilities to enhance Ratify’s contributions to supply chain security.
  • Ensure Secret Pulled Images, in alpha, enhances security in Kubernetes by restricting Kubelet-pulled images to workloads sharing credentials in IfNotPresent/Never scenarios.
  • Projected Service Account Tokens for Kubelet Image Credential Providers is another Kubernetes feature we’ve brought to alpha, which enables secure workload identity for image pulls by allowing Kubelet to exchange ServiceAccount tokens for credentials.
  • ClusterTrustBundle, in beta, provides a more stable API for easier X.509 certificate trust distribution in Kubernetes.

Managing complexity in a cloud native ecosystem

The ever-increasing complexity of the cloud native ecosystem is a perennial challenge (we have all seen the CNCF’s expansive Cloud Native Landscape diagram), and projects and tools that streamline that complexity make it simpler to build, run, and manage Kubernetes workloads anywhere. Notable contributions to help address complexity include:

  • Drasi (recently accepted as a CNCF Sandbox project) is a change data processing platform that automates real-time detection, evaluation, and meaningful reaction to events in complex, event-driven systems.
  • KubeFleet (recently accepted into the CNCF Sandbox) is a cloud native solution tailored for the at-scale management of applications running in multiple Kubernetes clusters, providing orchestration and coordination of applications across a fleet of Kubernetes clusters.

Cost management capabilities

Evolving economic conditions mean that cost management is front of mind for many organizations and teams. We are active contributors to several features and projects that help with cost-management capabilities:

  • Dynamic Resource Allocation (DRA) allows dynamic allocation of specialized hardware resources beyond traditional CPU and memory, like GPUs and FPGAs, enabling better resource usage and reducing idle hardware. The goal of this is to simplify the integration of specialized accelerators for hardware vendors without changing Kubernetes core components. In response to community feedback, DRA introduced a revised API in v1.31 and has reached beta in v1.32. We’re working with the community to improve DRA’s stability, aiming for general availability
  • Karpenter enables maximally cost-efficient, fully automatic node infrastructure to run your workloads. The project continues to evolve at the speed of Kubernetes, incorporating the latest scheduling features (such as new topologySpread constraints), and focusing on stability and performance.
  • Cluster Autoscaler continues to fulfill its role as the de facto Kubernetes node autoscaler, delivering functional support for DRA in its v1.32 release.
  • SpinKube (newly joining the CNCF Sandbox) allows for running serverless WebAssembly workloads in Kubernetes, offering the option of more performant and secure serverless scenarios in Kubernetes.

Our commitment to building in the open

While I have highlighted some of our recent work on CNCF projects that help address the challenges of security, complexity, and cost-optimization here—this is certainly not an exhaustive view of the broader work that our team does in the community. In fact, Microsoft has been one of the most active contributors to CNCF projects over the last year! We create and contribute to several CNCF projects, including:

  • Graduated (containerd, Cilium, Dapr, Envoy, Helm, Istio, KEDA, Kubernetes, Open Policy Agent).
  • Incubating (Flatcar, Notary Project, OpenCost).
  • Sandbox (Copa, Drasi, Eraser, Headlamp, Inspektor Gadget, KubeFleet, Kubernetes AI Toolchain Operator (KAITO), OCI Registry as Storage (ORAS), Radius, Ratify, SpinKube, VS Code Kubernetes Tools), Applying (Hyperlight).

Whether it be serving on the CNCF’s Technical Oversight Committee, as Special Interest Group Chairs and Tech Leads, or contributing as maintainers on a wide variety of ecosystem projects, Azure team members chop wood and carry water to keep our open source communities running smoothly.

You can meet many of our contributors in the Azure booth and Project Pavilion at KubeCon!

Azure Kubernetes Service announcements

In addition to our work in the upstream community, I am happy to share several new capabilities in Azure Kubernetes Service (AKS). Our customers can take advantage of improved AI capabilities, enhanced security and networking, simplified multi-cluster operations, and better cost efficiency.

Improved AI capabilities

AI continues to play a pivotal role in driving innovation and maintaining competitiveness. We are introducing several new AI capabilities that underscore the importance of advanced search, high-throughput model inferencing, and customizable setup, including:

  • Retrieval-augmented generation (RAG) in the Kubernetes AI Toolchain Operator (KAITO) enables advanced search capabilities using open-source KAITO on your AKS cluster.
  • Default inference with vLLM with the AI toolchain operator add-on offers significantly faster time to process incoming requests and greater flexibility in API and model selection.
  • The ability to install custom GPU drivers for a more customizable setup.

Enhanced networking and security

Robust security and reliable networking are critical not only for protecting applications and data, meeting compliance requirements, and ensuring seamless connectivity, but are also essential for maintaining trust with users and stakeholders. Some recent networking and security enhancements in AKS include:

  • Network isolated clusters, now generally available, simplify the process of restricting network access and reduce the risk of unintentional exposure of public endpoints.
  • Improved load balancing and support for multiple load balancers allow for better scalability and flexibility.
  • Improved network endpoint management with Cilium Endpoint Slices and broader networking improvements, including support for dual-stack networking.
  • Advanced Container Networking Services enhancements provide fine-grained control over application traffic and detailed network traffic logs for better security auditing and performance analysis.

Simplified operations management at scale

Managing multi-cluster Kubernetes environments at scale means keeping configurations consistent and secure across clusters, while also ensuring smooth monitoring and data handling. New capabilities to enable teams to manage more efficiently at scale include:

  • Multi-cluster auto-upgrade in Azure Kubernetes Fleet Manager, now generally available, makes it simpler to safely and predictably update Kubernetes and node images in multi-cluster environments. Additionally, multi-cluster workload rollout strategies and eviction controls improve operational efficiency and control.
  • Deployment recommendations ensure seamless cluster creation, even when the selected SKU is unavailable, by suggesting alternative SKUs based on available capacity.
  • AKS communication manager simplifies maintenance notifications and monitoring by providing timely alerts and detailed failure reasons, reducing operational hassles and enhancing observability.

Greater visibility and cost efficiency

AKS is also introducing additional metrics and efficiency features to enable advanced observability and cost management. These include cost recommendations tailored to your cluster configuration, new Azure platform metrics for monitoring control plane components, and seamless monitoring for Java and Node microservices through auto-instrumentation.

Additionally, the Microsoft GitOps team is announcing the Private Preview of ArgoCD, delivered as a cluster extension across AKS and Arc-enabled Kubernetes, offering easy deployment, official support, and enhanced security for an enterprise-grade GitOps experience from cloud to edge.

We’re excited to meet up with you at KubeCon + CloudNativeCon

The Azure team is excited to be at KubeCon + CloudNativeCon Europe 2025, and I hope that you are too! There are many ways you can connect with our team in London:

And more!

The team can’t wait to meet you and hear your thoughts.

Happy KubeCon + CloudNativeCon!

Microsoft Azure at KubeCon Europe 2025

Join us at KubeCon Europe 2025 in London, UK, from April 1-4.

The post What’s new with Microsoft in open-source and Kubernetes at KubeCon + CloudNativeCon Europe 2025 appeared first on Microsoft Open Source Blog.

Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Meet the AWS News Blog team!

1 Share

Now that Jeff Barr has retired from the AWS News Blog as of December last year, the AWS News Blog team will keep sharing the most important and impactful AWS product launches the moment they become available. I want to quote Jeff’s last comment on the future of the News Blog again:

Going forward, the team will continue to grow and the goal remains the same: to provide our customers with carefully chosen, high-quality information about the latest and most meaningful AWS launches. The blog is in great hands and this team will continue to keep you informed even as the AWS pace of innovation continues to accelerate.

Since 2016, Jeff has been building the AWS News Blog as a team. Currently, we’re a group of 11 bloggers working in North America, South America, Asia, Europe, and Africa. We co-work with AWS product teams, testing new features firsthand on behalf of customers, and delivering key details in the News Blog the way Jeff has always done.

The Leadership Principles for AWS News Bloggers that Jeff shared on LinkedIn are a textbook for anyone writing for customers in tech companies. They’re the fundamentals that can help you understand and get started blogging quickly, and we’ll continue to stick to these principles with our team. This is why the AWS News Blog is different from other tech companies’ product news channels.

Voices from blog writers
You may be familiar with the names of News Blog writers, but you may not have had the chance to hear about them. Let us introduce ourselves!

Channy Yun (윤석찬)

I’m honored to continue Jeff’s legacy as a new lead blogger of the News Blog team; he is my role model. When I joined AWS in 2014, the first thing I did was to create the AWS Korea Blog and I started translating Jeff’s blog posts into the Korean language. During the journey, I learned how to write accurate, honest, and powerful guides to help customers get started with new AWS products and features.

Danilo Poccia

Since my first News Blog post in 2018, I have learned so much by being part of this team. Working with product managers and service teams is always an amazing experience. I am interested in serverless, event-driven architectures, and AI/ML. It’s incredible how technologies like generative AI are becoming part of software development implicitly (through AI-enabled development tools) and explicitly (by using models in code).

Sébastien Stormacq

I’m fortunate to have been a part of this team since 2019. When I don’t write posts, I produce episodes of the AWS Developers Podcast and le podcast AWS en français. I also work with the teams for Amazon EC2 Mac, AWS SDK for Swift, and the CodeBuild and CodeArtifact teams trying to make the AWS Cloud easier to use for Apple developers. My pet project is the Swift Runtime for AWS Lambda.

Veliswa Boya

The Amazon Leadership Principles (LPs) guide all that we do here at AWS, including the work we do as authors of the News Blog. As a developer advocate, I’ve taken the guidance of the LPs and used it to guide members of the AWS community who are looking to create technical content, especially those new in their technical content creation journey.

Donnie Prakoso

Just like brewing coffee, being a blog author has been a mix of fun, challenge, and reward. I’ve been particularly fortunate to observe how customer obsession is built into AWS teams. I’ve seen how they work backwards, transforming your feedback into services or features. I genuinely hope that you enjoy reading our articles and look forward to the next chapter of the News Blog team.

Esra Kayabali

As an author, I’m committed to delivering timely information about the latest AWS innovations and launches to our global audience of builders, developers, and technology enthusiasts. I understand the importance of providing clear, accurate, and actionable content that helps you use AWS services effectively. Happy reading everyone!

Matheus Guimaraes

My specialties are .NET development and microservices, but I’ve always been a jack-of-all-trades and writing for this blog helps me to keep my knife sharp across all corners of modern technology, while also helping others do the same. Thousands of people read the AWS News Blog and use it as a go-to source to keep up with what’s new and to help them make decisions, so I know that what we are doing is meaningful work with huge impact.

Prasad Rao

Through my blogs, I strive to highlight not just the “what” of new services, but also the “why” and “how” they can transform businesses and user experiences. As a solutions architect specializing in Microsoft Workloads on AWS, I help customers migrate and modernize their workloads and build scalable architecture on AWS. I also mentor diverse people to excel in their cloud careers.

Elizabeth Fuentes

Every time I start writing a new blog, I feel honored to be part of this team, to be able to experiment with something new before it’s released, and to be able to share my experience with the reader. This team is made up of specialists of all levels and from multiple countries and together, we are a multicultural and multi-specialty team. Thank you, reader, for being here.

Betty Zheng (郑予彬)

Joining the News Blog team has transformed how I communicate about technology. With an ever-curious mindset, I approach each new announcement aiming to make innovative services accessible and engaging. By bringing my unique and diverse perspective to technical content, I strive to help developers truly enjoy exploring our latest technologies.

Micah Walter

As a senior solutions architect, I support enterprise customers in the New York City region and beyond. I advise executives, engineers, and architects at every step along their journey to the cloud, with a deep focus on sustainability and practical design.

I also want to give credit to our behind-the-scenes editor-in-chief, Jane Watson, and program manager, Jane Scolieri, who play an essential role in helping us get product launch news to you as soon as it happens, including the 60 launches we announced in one week at re:Invent 2024!

Share your feedback
At AWS, we are customer obsessed. We’re always focused on improving and providing a better customer experience, and we need your feedback to do so. Take our survey to share insights about your experience with the AWS News Blog and suggestion for how we can serve you even better.

This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.

Channy

Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

AST Grep and Transform

1 Share

This page describes a strategy to build GenAI scripts that use Abstract Syntax Trees (AST) to parse and modify source code. When applicable, it provides an extremely flexible and stable way to apply large scale changes to source code. Interrested? Let’s dive in!

The strategy of AST-based code transformation

One of the challenge when creating GenAI scripts that update source code is to correctly locate and update source code. The slightest mistake in the location of the code to update can lead to a broken code. This is especially true when the code to update is not a simple string, but a complex structure like an object or a function call.

In some cases, you know “precisely” which part of the code you want to update. For example, you want to refresh the documentation of a function after a change. You know that the documentation is located just before the function definition at least in the sense of the programming language but the number of empty lines or space may vary.

math.ts
/** sums a and b */
function fn(a: number, b: number): number {
return a - b // oops outdated
}

In such scenario, you can use the Abstract Syntax Tree (AST) to locate the code to update. The AST is a tree representation of code.

So instead of fighting spaces and new lines, you can just locate the function_declaration node that follows a comment node.

docs.genai.mts
const functionAndComment = sg.find_function_with_outdated_comment()

Once you’ve located the node to update, you could do any transformation you want, e.g. replace it with another text. In terms of GenAI script, this means that you can build a prompt that includes as much context as you need, generate a response.

docs.genai.mts
$`Update the documentation of the function 'fn' to reflect the new behavior of the function.`
fence(functionAndComment.text())
subs a and b

Once the LLM responds with the new comment, you could insert it as the new content of the node in the AST.

docs.genai.mts
functionAndComment.comment().text = response

Voila! You’ve only touched the part of the file you wanted to update!

math.ts
/** subs a and b */
function fn(a: number, b: number): number {
return a - b
}

To recap, this strategy is based on the following steps:

  1. search Use the AST to locate the node to update.
  2. transform and replace Use the LLM to generate the new content of the node.
  3. commit Update the node in the AST with the new content.

ast-grep

AST-Grep Logo

ast-grep(sg) is a fast and polyglot tool for code structural search, lint, rewriting at large scale. sg provides us the AST-searh-and-replace capabilities we need to implement the strategy above.

GenAIScript benefits from the excellent Node.JS integration, which is available through the host.astGrep() method.

docs.genai.mts
// search
const { matches, replace } = await sg.search("ts", "src/*fib*.ts", {
rule: {
kind: "function_declaration",
not: {
precedes: {
kind: "comment",
stopBy: "neighbor",
},
},
},
})
// transform
const edits = sg.changeset()
for (const match of matches) {
const { text } = await prompt`Generate new docs for ${match.text()}`
// replace
edits.replace(match.comment(), text) // it's somewhat more involved
}
// commit all edits to file
await workspace.writeFiles(edits.commit())

Sample: Doc generator / updater

You will find a full write down of the making of the documentation generator/updater script below in the documentation. I encourage you to read it to dig deeper.

The docs scripts is a documentation generator/updater.

  • uses ast-grep to find and generate missing documentation for exported TypeScript function. A second LLM-as-Judge request is used to check that the generated documentation is correct.
  • if the diff option is selected, it will filter out functions that do not intersect with the diff (this is rather naive but a good start…).
  • it can also be used to update the documentation of a function that has changed.
  • it works regarless of the file size or the number of files as most transformations are hyper-localized.
generate and refresh my docs plz
npm run genaiscript docs -- --diff

Here are some example of applications of the scripts (one-shot, no human edit, multi-edit per file):

Here it goes:

docs.genai.mts
import { classify } from "genaiscript/runtime"
import { docify } from "./src/docs.mts"
import { prettier } from "./src/prettier.mts"
script({
title: "Generate TypeScript function documentation using AST insertion",
description: `
## Docs!
This script generates and updates TypeScript function using an AST/LLM hybrid approach.
It uses ast-grep to look for undocumented and documented functions,
then uses a combination of LLM, and LLM-as-a-judge to generate and validate the documentation.
It also uses prettier to format the code before and after the generation.
By default,
- no edits are applied on disk. It is recommended to
run this script with \`--vars 'applyEdits=true'\` to apply the edits.
- if a diff is available, it will only process the files with changes.
`,
accept: ".ts",
files: "src/cowsay.ts",
parameters: {
diff: {
type: "boolean",
default: false,
description:
"If true, the script will only process files with changes with respect to main.",
},
pretty: {
type: "boolean",
default: false,
description:
"If true, the script will prettify the files before analysis.",
},
applyEdits: {
type: "boolean",
default: false,
description: "If true, the script will not modify the files.",
},
missing: {
type: "boolean",
default: true,
description: "Generate missing docs.",
},
update: {
type: "boolean",
default: true,
description: "Update existing docs.",
},
maxFiles: {
type: "integer",
description: "Maximum number of files to process.",
},
},
})
const { output, dbg, vars } = env
let { files } = env
const { applyEdits, diff, pretty, missing, update, maxFiles } = vars
dbg({ applyEdits, diff, pretty, missing, update, maxFiles })
if (!missing && !update) cancel(`not generating or updating docs, exiting...`)
if (!applyEdits)
output.warn(
`edit not applied, use --vars 'applyEdits=true' to apply the edits`
)
// filter by diff
const gitDiff = diff ? await git.diff({ base: "main" }) : undefined
console.debug(gitDiff)
const diffFiles = gitDiff ? DIFF.parse(gitDiff) : undefined
if (diffFiles?.length) {
dbg(`diff files: ${diffFiles.map((f) => f.to)}`)
files = files.filter(({ filename }) =>
diffFiles.some((f) => path.resolve(f.to) === path.resolve(filename))
)
dbg(`diff filtered files: ${files.length}`)
}
if (maxFiles && files.length > maxFiles) {
dbg(`random slicing files to ${maxFiles}`)
files = parsers.tidyData(files, {
sliceSample: maxFiles,
}) as WorkspaceFile[]
}
const sg = await host.astGrep()
const stats = []
for (const file of files) {
console.debug(file.filename)
// normalize spacing
if (pretty) await prettier(file)
// generate missing docs
if (missing) {
stats.push({
filename: file.filename,
kind: "new",
gen: 0,
genCost: 0,
judge: 0,
judgeCost: 0,
edits: 0,
updated: 0,
})
await generateDocs(file, stats.at(-1))
}
// generate updated docs
if (update) {
stats.push({
filename: file.filename,
kind: "update",
gen: 0,
genCost: 0,
judge: 0,
judgeCost: 0,
edits: 0,
updated: 0,
})
await updateDocs(file, stats.at(-1))
}
}
if (stats.length)
output.table(
stats.filter((row) =>
Object.values(row).some((d) => typeof d === "number" && d > 0)
)
)
async function generateDocs(file: WorkspaceFile, fileStats: any) {
const { matches: missingDocs } = await sg.search(
"ts",
file.filename,
{
rule: {
kind: "export_statement",
not: {
follows: {
kind: "comment",
stopBy: "neighbor",
},
},
has: {
kind: "function_declaration",
},
},
},
{ diff: gitDiff, applyGitIgnore: false }
)
dbg(`found ${missingDocs.length} missing docs`)
const edits = sg.changeset()
// for each match, generate a docstring for functions not documented
for (const missingDoc of missingDocs) {
const res = await runPrompt(
(_) => {
_.def("FILE", missingDoc.getRoot().root().text())
_.def("FUNCTION", missingDoc.text())
// this needs more eval-ing
_.$`Generate a function documentation for <FUNCTION>.
- Make sure parameters are documented.
- Be concise. Use technical tone.
- do NOT include types, this is for TypeScript.
- Use docstring syntax. do not wrap in markdown code section.
The full source of the file is in <FILE> for reference.`
},
{
model: "large",
responseType: "text",
label: missingDoc.text()?.slice(0, 20) + "...",
}
)
// if generation is successful, insert the docs
fileStats.gen += res.usage?.total || 0
fileStats.genCost += res.usage?.cost || 0
if (res.error) {
output.warn(res.error.message)
continue
}
const docs = docify(res.text.trim())
// sanity check
const judge = await classify(
(_) => {
_.def("FUNCTION", missingDoc.text())
_.def("DOCS", docs)
},
{
ok: "The content in <DOCS> is an accurate documentation for the code in <FUNCTION>.",
err: "The content in <DOCS> does not match with the code in <FUNCTION>.",
},
{
model: "small",
responseType: "text",
temperature: 0.2,
systemSafety: false,
system: ["system.technical", "system.typescript"],
}
)
fileStats.judge += judge.usage?.total || 0
fileStats.judegeCost += judge.usage?.cost || 0
if (judge.label !== "ok") {
output.warn(judge.label)
output.fence(judge.answer)
continue
}
const updated = `${docs}\n${missingDoc.text()}`
edits.replace(missingDoc, updated)
fileStats.edits++
}
// apply all edits and write to the file
const modifiedFiles = edits.commit()
if (!modifiedFiles?.length) {
dbg("no edits to apply")
return
}
fileStats.updated = 1
if (applyEdits) {
await workspace.writeFiles(modifiedFiles)
await prettier(file)
} else {
output.diff(file, modifiedFiles[0])
}
}
async function updateDocs(file: WorkspaceFile, fileStats: any) {
const { matches } = await sg.search(
"ts",
file.filename,
YAML`
rule:
kind: "export_statement"
follows:
kind: "comment"
stopBy: neighbor
has:
kind: "function_declaration"
`,
{ diff: gitDiff, applyGitIgnore: false }
)
dbg(`found ${matches.length} docs to update`)
const edits = sg.changeset()
// for each match, generate a docstring for functions not documented
for (const match of matches) {
const comment = match.prev()
const res = await runPrompt(
(_) => {
_.def("FILE", match.getRoot().root().text(), { flex: 1 })
_.def("DOCSTRING", comment.text(), { flex: 10 })
_.def("FUNCTION", match.text(), { flex: 10 })
// this needs more eval-ing
_.$`Update the docstring <DOCSTRING> to match the code in function <FUNCTION>.
- If the docstring is up to date, return /NOP/.
- do not rephrase an existing sentence if it is correct.
- Make sure parameters are documented.
- do NOT include types, this is for TypeScript.
- Use docstring syntax. do not wrap in markdown code section.
- Minimize updates to the existing docstring.
The full source of the file is in <FILE> for reference.
The source of the function is in <FUNCTION>.
The current docstring is <DOCSTRING>.
`
},
{
model: "large",
responseType: "text",
flexTokens: 12000,
label: match.text()?.slice(0, 20) + "...",
temperature: 0.2,
systemSafety: false,
system: ["system.technical", "system.typescript"],
}
)
fileStats.gen += res.usage?.total || 0
fileStats.genCost += res.usage?.cost || 0
// if generation is successful, insert the docs
if (res.error) {
output.warn(res.error.message)
continue
}
if (res.text.includes("/NOP/")) continue
const docs = docify(res.text.trim())
// ask LLM if change is worth it
const judge = await classify(
(_) => {
_.def("FUNCTION", match.text())
_.def("ORIGINAL_DOCS", comment.text())
_.def("NEW_DOCS", docs)
_.$`An LLM generated an updated docstring <NEW_DOCS> for function <FUNCTION>. The original docstring is <ORIGINAL_DOCS>.`
},
{
APPLY: "The <NEW_DOCS> is a significant improvement to <ORIGINAL_DOCS>.",
NIT: "The <NEW_DOCS> contains nits (minor adjustments) to <ORIGINAL_DOCS>.",
},
{
model: "large",
responseType: "text",
temperature: 0.2,
systemSafety: false,
system: ["system.technical", "system.typescript"],
}
)
fileStats.judge += judge.usage?.total || 0
fileStats.judgeCost += judge.usage?.cost || 0
if (judge.label === "NIT") {
output.warn("LLM suggests minor adjustments, skipping")
continue
}
edits.replace(comment, docs)
fileStats.edits++
}
// apply all edits and write to the file
const modifiedFiles = edits.commit()
if (!modifiedFiles?.length) {
dbg("no edits to apply")
return
}
fileStats.updated = 1
if (applyEdits) {
await workspace.writeFiles(modifiedFiles)
await prettier(file)
} else {
output.diff(file, modifiedFiles[0])
}
}
Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The latest Azure AI Foundry innovations help you optimize AI investments and differentiate your business

1 Share

Over the last couple of years, I’ve seen tech teams go from feeling excited yet overwhelmed by the blistering pace of AI advancement to now helping bend and direct that curve of innovation using the cutting-edge capabilities of Azure AI Foundry

This rapid transformation underscores the critical role of a robust enterprise AI platform to help you push the AI curve. We continually add capabilities to Azure AI Foundry to empower your teams to do just that. That means business leaders in the era of AI have a lot to consider—it’s easy to get lost in the forest when all the new trees keep making it bigger. 

Today I’m sharing a couple of the most important Azure AI Foundry innovations announced in recent weeks that keep the forest in view, and that improve operational efficiency, maximize investments, so that you can focus on differentiating in a competitive landscape. 

Tools to help power your agentic future 

AI agents have the potential to transform every business process—revolutionizing productivity by automating routine tasks and enabling employees to focus on more strategic work. We’ve announced several agentic capabilities and tools on Azure AI Foundry to help you efficiently put AI agents to work in your organization. 

New knowledge tools with Azure AI Agent Service securely ground AI agent outputs with enterprise knowledge, for accurate, relevant, and contextually aware responses. Azure AI Agent Service provides a wide range of knowledge tools for various data types, including unstructured, structured, private, licensed, and public web data.

And Microsoft Fabric data agents were announced today at the Microsoft Fabric Community Conference to allow developers using Azure AI Agent Service to connect customized, conversational agents created in Microsoft Fabric. These data agents can reason over and unlock insights from various enterprise structured and semantic data sources, making better data-driven decisions. Fabric data agents retrieve, understand, and synthesize data from OneLake, determining when to use specific data and how to combine it. 

Combining Fabric’s sophisticated enterprise data analysis capabilities with Azure AI Foundry’s cutting-edge GenAI technology means you can create custom conversational AI agents leveraging domain expertise. And the Fabric-Foundry pathway connects your data teams with your dev teams, putting them on a common, secure, and enterprise-ready AI platform. 

One customer making use of the Microsoft Fabric-Azure AI Foundry bridge is NTT DATA. NTT DATA leverages data agents in Microsoft Fabric to actually have conversations with HR and back office operations data to better understand what is happening in the organization. 

We also recently announced two more capabilities to empower businesses to deploy AI not just as an assistant, but as an active digital workforce:

Responses API is a powerful tool enabling AI-powered apps to seamlessly retrieve information, process data, and then act. It simplifies complex tasks, allowing your business to operate more efficiently and ultimately reduce costs. 

Computer-using agent, or CUA, is a breakthrough AI model that can navigate software interfaces, execute tasks, and automate workflows. It can open applications, click buttons, fill out forms, and navigate multi-page workflows. CUA can adapt dynamically to changes for smooth operations across both web and desktop applications, integrating disparate systems without API dependencies.

Enhancing AI efficiency and performance with Azure AI Foundry 

The only thing growing as fast as generative AI technology is the number of use cases for it across your organization, along with the need for tools to optimize efficiency and performance. Azure AI Foundry includes a suite of governance tools and controls to monitor and manage costs, compliance, performance, and more. We also added NVIDIA NIM microservices and NVIDIA AgentIQ toolkit to unlock unprecedented efficiency, performance, and cost optimization for your AI projects. 

Part of the NVIDIA AI Enterprise software suite, NVIDIA NIM is a suite of easy-to-use microservices engineered for secure, reliable, and high-performance AI inferencing, and are built to scale seamlessly on managed Azure compute, providing: 

  • Zero-configuration deployment: Get started quickly with out-of-the-box optimization. 
  • Seamless Azure integration: Works effortlessly with Azure AI Agent Service and Semantic Kernel. 
  • Enterprise-grade reliability: Benefit from NVIDIA AI Enterprise support for continuous performance and security. 
  • Scalable inference: Tap into Azure’s NVIDIA accelerated infrastructure for demanding workloads. 
  • Optimized workflows: Accelerate applications ranging from large language models to advanced analytics. 

Stay agile and performant with Azure OpenAI Service Provisioned spillover 

Provisioned (PTU) spillover is a new feature in Azure OpenAI Service that helps ensure consistent and efficient performance of AI applications, even during high usage periods.

Now in public preview, PTU spillover automatically reroutes excess traffic from your provisioned deployments to help maintain smooth service operation and uninterrupted critical processes. This feature gives you the flexibility to manage unexpected traffic bursts or peak demand season without compromising performance so you can adapt to dynamic conditions and maximize your AI investments. 

New report: Customized generative AI experiences to differentiate your business

One way we see more and more companies using AI to push the curve on innovation is by leaning into customization capabilities that can create distinctive experiences or services that help their business stand out in the competitive market.

A book on a table

We recently released a report, DIY GenAI: Customizing generative AI for unique value, that details how businesses are using capabilities like fine-tuning, retrieval-augmented generation, or RAG, and agentic specialization to differentiate. After all, the world’s most powerful AI models don’t know anything about your specific business so it’s your unique business data and customization that helps you differentiate from the competition.

The report also highlights the motivations, methods, and challenges faced by technology leaders as they tailor AI models to create net-new value for their businesses.

The findings are worth reading because they offer a glimpse at where AI development is going in the future. I’m confident that what feels custom today will most likely be the norm faster than we can all believe.

Build a more accessible world with Azure AI Foundry 

Microsoft has a long-standing legacy of building inclusive technologies, from early screen readers to speech-to-text innovations. This commitment to accessibility is well in line with our mission as a company—and it’s now being realized in Azure AI Foundry where it’s integrated right into the AI development lifecycle.

Accessibility and inclusivity in AI are essential for any business because they can help expand your reach, boost customer satisfaction, and even enhance your reputation for social responsibility. Put simply, prioritizing these values can drive innovation and your long-term success. 

Kickstart your AI transformation with Azure AI Foundry 

Can you believe that was just March? We delivered so much more! From exciting developments in Azure AI Foundry to our comprehensive approach to trustworthy AI, we’re here to support you and lead through this fast-paced era of AI. We’re proud to offer a comprehensive platform with quality, flexibility, security, safety, and choice. When it’s time to invest in AI transformation for your business, you can trust that the latest innovations are ready and waiting for you on Azure AI Foundry.


About Jessica 

Jessica leads Azure Data, AI, and Digital Applications at Microsoft. Find Jessica’s blog posts here and be sure to follow Jessica on LinkedIn.

The post The latest Azure AI Foundry innovations help you optimize AI investments and differentiate your business appeared first on Microsoft Azure Blog.

Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories