alvinashcraft's blurblog

Developers Testing More, JetBrains Study Finds by Loraine Lawson
Saturday December 21^st, 2024 at 10:08 AM

The New Stack

Developers are doing more testing, according to a recent JetBrains report on the State of Developer Ecosystems.

The percentage of developers who test has gone up from 85% last year to 95% in 2024. The proportion of developers who are doing unit tests, integration tests and end-to-end tests also rose.

However, only 18% are using artificial intelligence in the testing software they use.

The survey also looked at whether AI provides people with more time to code. Users overwhelmingly say that saving time and doing things faster are the top benefits of using AI tools for development.

Sixty-five percent said they spend more than half of their work time coding, up from 57% in 2023. Half of those that use these tools save at least 2 hours a week. In contrast, 4% say they don’t save any time per week due to using these tools, and another 46% save no more than 2 hours a week.

It’s worth noting that only 23% say using AI tools for coding actually improves the quality of the code and solutions being created.

It also seems that previous estimates of GitHub Copilot use may have been overstated.

In 2024, JetBrains asked specifically if people had used specific AI tools for coding and other development activities. When asked this way instead of about use for any purposes, GitHub Copilot usage fell from 46% to 26% and ChatGPT use fell from 70% to 49%.

The above section was written by Lawrence Hecht, TNS Analyst.

Weaviate Offers Hosted Embedded Service for AI Applications

Vector database company Weaviate launched a new hosted embedding service for AI applications this month. Called Weaviate Embeddings, the service supports both open source and proprietary embedding models. It gives developers full control over their embeddings, allowing them to switch between models. Also, it does not have a rate limit on embeddings per second in production environments.

The service is hosted in Weaviate Cloud and runs on GPUs.

Tabnine Feature Flags Unlicensed Code in AI-Generated Software

Tabnine, creator of the original AI code assistant, introduced a feature called Code Provenance and Attribution that checks AI-generated code to see if there are potential IP or copyright issues with the code.

It checks the code against publicly visible GitHub code and flags any matches. The code checker references the source repository, as well as the license type, which makes it easy for a developer to determine if it can be used based on the organization’s specific standards and requirements.

Tabnine soon expects to add the capability to allow users to identify specific repos, such as those maintained by competitors, and then have Tabnine check generated code against them as well. It also plans to add censorship capability, allowing Tabnine administrators to remove matching code before it is displayed to the developer.

Right now, Code Provenance and Attribution are in private preview and open to any Tabnine enterprise customs. It works on all available models.

Google Launches Gemini 2.0 Flash and Javascript/Python Code Assistant

Google has updated its Gemini Flash model. Gemini Flash 2.0 is twice as fast as 1.5 Pro, the company said. It also introduced Multimodal live API for building dynamic applications with both real-time audio and video streaming, according to the blog post.

Developers can use Gemini 2.0 Flash to generate responses that can include text, audio and images through an API call. Gemini 2.0 Flash can be accessed using the Gemini API in Google AI Studio and Vertex AI. Right now it’s experimental, but general availability is expected next year.

Gemini 2.0 is trained to use tools, which Google noted is a foundational capability for building AI agentic “experiences.” It can natively call tools like Google Search and code execution in addition to custom third-party functions via function calling.

Using Google Search natively as a tool leads to more factual and comprehensive answers and increases traffic to publishers, the post added.

“Multiple searches can be run in parallel leading to improved information retrieval by finding more relevant facts from multiple sources simultaneously and combining them for accuracy,” the post stated.

Google also introduced an experimental AI-powered code agent called Jules, which can handle Python and Javascript coding tasks.

“Working asynchronously and integrated with your GitHub workflow, Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build,” the post stated. “Jules creates comprehensive, multistep plans to address issues, efficiently modifies multiple files, and even prepares pull requests to land fixes directly back into GitHub.”

Right now Jules is available for a “select group of trusted testers,” but plans are to make it available for other developers in early 2025.

Finally, there is a trusted tester program developers can join to try out the Colab data science agent. It allows developers to describe their analysis goals in plain language, and then it builds a Colab notebook. It’s expected to be more widely available in the first half of 2025.

The post Developers Testing More, JetBrains Study Finds appeared first on The New Stack.

Read the whole story

alvinashcraft

1 hour ago

reply

West Grove, PA

AI Writing Is Improving, But It Still Can't Match Human Creativity by BeauHD
Saturday December 21^st, 2024 at 10:08 AM

Slashdot

sciencehabit shares a report from Science Magazine: With a few keystrokes, anyone can ask an artificial intelligence (AI) program such as ChatGPT to write them a term paper, a rap song, or a play. But don't expect William Shakespeare's originality. A new study finds such output remains derivative -- at least for now. [...] [O]bjectively testing this creativity has been tricky. Scientists have generally taken two tacks. One is to use another computer program to search for signs of plagiarism -- though a lack of plagiarism does not necessarily equal creativity. The other approach is to have humans judge the AI output themselves, rating factors such as fluency and originality. But that's subjective and time intensive. So Ximing Lu, a computer scientist at the University of Washington, and colleagues created a program featuring both objectivity and a bit of nuance. Called DJ Search, it collects pieces of text of a minimum length from whatever the AI outputs and searches for them in large online databases. DJ Search doesn't just look for identical matches; it also scans for strings whose words have similar meanings. To evaluate the meaning of a word or phrase, the program itself relies on a separate AI algorithm that produces a set of numbers called an "embedding," which roughly represents the contexts in which words are typically found. Synonymous words have numerically close embeddings. For example, phrases that swap "anticipation" and "excitement" are considered matches. After removing all matches, the program calculates the ratio of the remaining words to the original document length, which should give an estimate of how much of the AI's output is novel. The program conducts this process for various string lengths (the study uses a minimum of five words) and combines the ratios into one index of linguistic novelty. (The team calls it a "creativity index," but creativity requires both novelty and quality -- random gibberish is novel but not creative.) The researchers compared the linguistic novelty of published novels, poetry, and speeches with works written by recent LLMs. Humans outscored AIs by about 80% in poetry, 100% in novels, and 150% in speeches, the researchers report in a preprint posted on OpenReview and currently under peer review. Although DJ Search was designed for comparing people and machines, it can also be used to compare two or more humanmade works. For example, Suzanne Collins's 2008 novel The Hunger Games scored 35% higher in linguistic originality than Stephenie Meyer's 2005 hit Twilight. (You can try the tool online.)

Qualcomm Processors Properly Licensed From Arm, US Jury Finds by BeauHD
Saturday December 21^st, 2024 at 4:45 AM

Slashdot

Jurors delivered a mixed verdict on Friday, ruling that Qualcomm had properly licensed its central processor chips from Arm. This decision effectively concludes Arm's lawsuit against Qualcomm, which had the potential to disrupt the global smartphone and PC chip markets. The dispute stemmed from Qualcomm's $1.4 billion acquisition of chip startup Nuvia in 2021. Arm claimed Qualcomm breached contract terms by using Nuvia's designs without permission, while Qualcomm maintained its existing agreement covers the acquired technology. Arm demanded Qualcomm destroy the Nuvia designs created before the acquisition. Reuters reports: An eight-person jury in U.S. federal court deadlocked on the question of whether Nuvia, a startup that Qualcomm purchased for $1.4 billion in 2021, breached the terms of its license with Arm. But the jury found that Qualcomm did not breach Nuvia's license with Arm. The jury also found that Qualcomm's chips created using Nuvia technology, which have been central to Qualcomm's push into the personal computer market, are properly licensed under its own agreement with Arm, clearing the way for Qualcomm to continue selling them.

OpenAI teases new reasoning model—but don’t expect to try it soon by Kylie Robison
Saturday December 21^st, 2024 at 4:44 AM

The Verge - All Posts

Image: Alex Parkin / The Verge

For the last day of ship-mas, OpenAI previewed a new set of frontier “reasoning” models dubbed o3 and o3-mini. The Verge first reported that a new reasoning model would be coming during this event.

The company isn’t releasing these models today (and admits final results may evolve with more post-training). However, OpenAI is accepting applications from the research community to test these systems ahead of public release (which it has yet to set a date for). OpenAI launched o1 (codenamed Strawberry) in September and is jumping straight to o3, skipping o2 to avoid confusion (or trademark conflicts) with the British telecom company called O2.

The term reasoning has become a common buzzword in the AI industry lately, but it basically means the machine breaks down instructions into smaller tasks that can produce stronger outcomes. These models often show the work for how it got to an answer, rather than just giving a final answer without explanation.

According to the company, o3 surpasses previous performance records across the board. It beats its predecessor in coding tests (called SWE-Bench Verified) by 22.8 percent and outscores OpenAI’s Chief Scientist in competitive programming. The model nearly aced one of the hardest math competitions (called AIME 2024), missing one question, and achieved 87.7 percent on a benchmark for expert-level science problems (called GPQA Diamond). On the toughest math and reasoning challenges that usually stump AI, o3 solved 25.2 percent of problems (where no other model exceeds 2 percent).

OpenAI claims o3 performs better than its other reasoning models in coding benchmarks.

The company also announced new research on deliberative alignment, which requires the AI model to process safety decisions step-by-step. So, instead of just giving yes/no rules to the AI model, this paradigm requires it to actively reason about whether a user’s request fits OpenAI’s safety policies. The company claims that when it tested this on o1, it was much better at following safety guidelines than previous models, including GPT-4.

Read the whole story

alvinashcraft

6 hours ago

reply

West Grove, PA

AIOpsLab: Building AI agents for autonomous clouds by Brenda Potts
Saturday December 21^st, 2024 at 4:43 AM

Microsoft Research

graphical user interface, application, icon

In our increasingly complex digital landscape, enterprises and cloud providers face significant challenges in the development, deployment, and maintenance of sophisticated IT applications. The broad adoption of microservices and cloud-based serverless architecture has streamlined certain aspects of application development while simultaneously introducing a host of operational difficulties, particularly in fault diagnosis and mitigation. These complexities can result in outages, which have the potential to cause major business disruptions, underscoring the critical need for robust solutions that ensure high availability and reliability in cloud services. As the expectation for five-nines availability grows, organizations must navigate the intricate web of operational demands to maintain customer satisfaction and business continuity.

To tackle these challenges, recent research on using AIOps agents for cloud operations—such as AI agents for incident root cause analysis (RCA) or triaging—has relied on proprietary services and datasets. Other prior works use frameworks specific to the solutions that they are building, or ad hoc and static benchmarks and metrics that fail to capture the dynamic nature of real-world cloud services. Users developing agents for cloud operations tasks with Azure AI Agent Service can evaluate and improve them using AIOpsLab. Furthermore, current approaches do not agree on standard metrics or a standard taxonomy for operational tasks. This calls for a standardized and principled research framework for building, testing, comparing, and improving AIOps agents. The framework should allow agents to interact with realistic service operation tasks in a reproducible manner. It must be flexible in extending to new applications, workloads, and faults. Importantly, it should go beyond just evaluating the AI agents and enabling users to improve the agents themselves; for example, by providing sufficient observability and even serving as a training environment (“gym”) to generate samples to learn on.

We developed AIOpsLab, a holistic evaluation framework for researchers and developers, to enable the design, development, evaluation, and enhancement of AIOps agents, which also serves the purpose of reproducible, standardized, interoperable, and scalable benchmarks. AIOpsLab is open sourced at GitHub (opens in new tab) with the MIT license, so that researchers and engineers can leverage it to evaluate AIOps agents at scale. The AIOpsLab research paper has been accepted at SoCC’24 (the annual ACM Symposium on Cloud Computing).

Flowchart of an AIOpsLab system. The chart is divided into four main sections: AIOps Tasks, Orchestrator, Problem Cache, and Service. AIOps Tasks list various applications like SocialNetwork, HotelReservation, E-Commerce, and others, each with associated Data, Actions, Metrics. These tasks connect to the Orchestrator. The Orchestrator is the central element and interacts with various components: it receives a Problem Query Q, detailing Problem, Task T, Workload W, Fault F, and Solution S. It is responsible for deploying or running the workload and injecting faults, as well as taking actions based on the Service State relayed by an Agent. The Problem Cache connects to a Workload Generator and a Fault Generator, creating Workload W for the Service. The Service component shows observability through Traces, Metrics, and Logs. It communicates with the Orchestrator to provide service state updates. The components are connected with arrows that indicate the flow of data and control between each part of the system. — Figure 1. System architecture of AIOpsLab.

Agent-cloud interface (ACI)

AIOpsLab strictly separates the agent and the application service using an intermediate orchestrator. It provides several interfaces for other system parts to integrate and extend. First, it establishes a session with an agent to share information about benchmark problems: (1) the problem description, (2) instructions (e.g., response format), and (3) available APIs to call as actions.

The APIs are a set of documented tools, e.g., get logs, get metrics, and exec shell, designed to help the agent solve a task. There are no restrictions on the agent’s implementation; the orchestrator poses problems and polls it for the next action to perform given the previous result. Each action must be a valid API call, which the orchestrator validates and carries out. The orchestrator has privileged access to the deployment and can take arbitrary actions (e.g., scale-up, redeploy) using appropriate tools (e.g., helm, kubectl) to resolve problems on behalf of the agent. Lastly, the orchestrator calls workload and fault generators to create service disruptions, which serve as live benchmark problems. AIOpsLab provides additional APIs to extend to new services and generators.

Example shows how to onboard an agent to AIOpsLab

from aiopslab import Orchestrator
class Agent:
    def __init__(self, prob, instructs, apis):
        self.prompt = self.set_prompt(prob, instructs, apis)
        self.llm = GPT4()

    async def get_action(self, state: str) -> str:
        return self.llm.generate(self.prompt + state)

#initialize the orchestrator
orch = Orchestrator()
pid = "misconfig_app_hotel_res-mitigation-1"
prob_desc, instructs, apis = orch.init_problem(pid)

#register and evaluate the agent
agent = Agent(prob_desc, instructs, apis)
orch.register_agent(agent, name="myAgent")
asyncio.run(orch.start_problem(max_steps=10))

Service

AIOpsLab abstracts a diverse set of services to reflect the variance in production environments. This includes live, running services that are implemented using various architectural principles, including microservices, serverless, and monolithic.

We also leverage open-sourced application suites such as DeathStarBench as they provide artifacts, like source code and commit history, along with run-time telemetry. Adding tools like BluePrint can help AIOpsLab scale to other academic and production services.

Workload generator

The workload generator in AIOpsLab plays a crucial role by creating simulations of both faulty and normal scenarios. It receives specifications from the orchestrator, such as the task, desired effects, scale, and duration. The generator can use a model trained on real production traces to generate workloads that align with these specifications. Faulty scenarios may simulate conditions like resource exhaustion, exploit edge cases, or trigger cascading failures, inspired by real incidents. Normal scenarios mimic typical production patterns, such as daily activity cycles and multi-user interactions. When various characteristics (e.g., service calls, user distribution, arrival times) can lead to the desired effect, multiple workloads can be stored in the problem cache for use by the orchestrator. In coordination with the fault generator, the workload generator can also create complex fault scenarios with workloads.

Fault generator

AIOpsLab has a novel push-button fault generator designed for generic applicability across various cloud scenarios. Our approach integrates application and domain knowledge to create adaptable policies and “oracles” compatible with AIOps scenarios. This includes fine-grained fault injection capable of simulating complex failures inspired by production incidents. Additionally, it can inject faults at various system levels, exposing root causes while maintaining semantic integrity and considering interdependencies between cloud microservices. The fault injector’s versatility can enhance the reliability and robustness of cloud systems by enabling thorough testing and evaluation of AIOps capabilities.

Observability

AIOpsLab is equipped with an extensible observability layer designed to provide comprehensive monitoring capabilities across various system layers for any AIOps tool. AIOpsLab collects a wide array of telemetry data, including (1) traces from Jaeger detailing the end-to-end paths of requests through distributed systems, (2) application logs formatted and recorded by Filebeat and Logstash, and (3) system metrics monitored by Prometheus. Additionally, AIOpsLab also captures lower-level system information such as syscall logs and cluster information. As mentioned, we handle potential data overload through flexible APIs to tune the telemetry data relevant to the AIOps tools.

AIOpsLab currently supports four key tasks within the AIOps domain. These tasks include incident detection, localization, root cause diagnosis, and mitigation. Additionally, we support several popular agent frameworks: React, Autogen, and TaskWeaver. Two key insights from the study emphasize the importance of observability and well-designed ACI. Observability is crucial for clear root-cause diagnosis, for instance, pinpointing a misconfigured API gateway can be vital in preventing service downtime.

Flexibility is another critical factor; the ability to execute arbitrary shell commands allowed for effective troubleshooting in real-time scenarios. Lastly, robust error handling is essential—providing agents with high-quality feedback on execution barriers, such as a failed database connection, ensures swift resolution and continuous improvement.

Example of ACI on diagnosis task

Agent: I need to first look at the logs to understand if there are any errors or warnings that could indicate problems. 

Action: 

``` 

get_logs("compose-post-service", "test-social-network") 

```
Service: [2024-Aug-04 23:18:49.365494] <info>: 

… 

Thrift: Sun Aug 4 23:19:19 2024 TSocket::open() connect() 
<Host: user-service Port: 9090>: Connection refused
 
Agent: … unable to connect to the `user-service` at port 9090, even though the pod is running. This suggests a possible network issue or misconfiguration in service discovery

Next steps

This research project adopts Microsoft security standards and Responsible AI principles, and we envision this research evolving into a vital resource for organizations aiming to optimize their IT operations. Additionally, we plan to collaborate with various generative AI teams to incorporate AIOpsLab as a benchmark scenario for evaluating state-of-the-art models. By doing so, we aim to foster innovation and encourage the development of more advanced AIOps solutions. This research is essential not only for IT professionals but also for anyone invested in the future of technology, as it has the potential to redefine how organizations manage operations, respond to incidents, and ultimately serve their customers in an increasingly automated world.

Acknowledgements

We would like to thank Yinfang Chen, Manish Shetty, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, and Suman Nath, for contributing to this project.

The post AIOpsLab: Building AI agents for autonomous clouds appeared first on Microsoft Research.

Read the whole story

alvinashcraft

6 hours ago

reply

West Grove, PA

December 2024 update of Power Automate for desktop by Yiannis Mavridis
Saturday December 21^st, 2024 at 4:43 AM

Power Automate Archive - Microsoft Power Platform Blog

We are happy to announce that the December 2024 update of Power Automate for desktop (version 2.51) has been released! You can download the latest release here.

Key features

New features and updates have been added, as described below.

Repairing flow automation errors is now available (in preview)

A new flow property has now been introduced in preview, allowing you to repair flow errors that are related to missing UI element selectors, when the flow is invoked from the cloud in attended or unattended mode.

Repairing can take place either with a Copilot suggested fix that needs reviewing before approving it, or with a manual repair during the flow run. The property is enabled by default, when you create a new desktop flow.

This functionality is currently only available in US-based environments for work or school accounts.

graphical user interface, text, application — *Copilot can now be used to repair automation errors that are related to UI element selector issues*

Dark mode is now available (in preview)

Dark mode is now available for Power Automate for desktop in preview and can be enabled through the corresponding option in the settings of the console.

graphical user interface, website — *Dark mode can now be selected from the console settings*

Get started

Hoping that you will find the above updates useful, please feel free to provide your questions and feedback in the Power Automate Community. If you want to learn more about Power Automate Desktop, get started with the below resources:

The post December 2024 update of Power Automate for desktop appeared first on Microsoft Power Platform Blog.

Read the whole story

alvinashcraft

6 hours ago

reply

West Grove, PA

Developers Testing More, JetBrains Study Finds by Loraine Lawson Saturday December 21st, 2024 at 10:08 AM

Weaviate Offers Hosted Embedded Service for AI Applications

Tabnine Feature Flags Unlicensed Code in AI-Generated Software

Google Launches Gemini 2.0 Flash and Javascript/Python Code Assistant

AI Writing Is Improving, But It Still Can't Match Human Creativity by BeauHD Saturday December 21st, 2024 at 10:08 AM

Qualcomm Processors Properly Licensed From Arm, US Jury Finds by BeauHD Saturday December 21st, 2024 at 4:45 AM

OpenAI teases new reasoning model—but don’t expect to try it soon by Kylie Robison Saturday December 21st, 2024 at 4:44 AM

AIOpsLab: Building AI agents for autonomous clouds by Brenda Potts Saturday December 21st, 2024 at 4:43 AM

Agent-cloud interface (ACI)

Example shows how to onboard an agent to AIOpsLab

Service

Workload generator

Fault generator

GraphRAG auto-tuning provides rapid adaptation to new domains

Observability

Example of ACI on diagnosis task

Next steps

Acknowledgements

December 2024 update of Power Automate for desktop by Yiannis Mavridis Saturday December 21st, 2024 at 4:43 AM

Key features

Repairing flow automation errors is now available (in preview)

Dark mode is now available (in preview)

Get started

Developers Testing More, JetBrains Study Finds by Loraine Lawson
Saturday December 21^st, 2024 at 10:08 AM

AI Writing Is Improving, But It Still Can't Match Human Creativity by BeauHD
Saturday December 21^st, 2024 at 10:08 AM

Qualcomm Processors Properly Licensed From Arm, US Jury Finds by BeauHD
Saturday December 21^st, 2024 at 4:45 AM

OpenAI teases new reasoning model—but don’t expect to try it soon by Kylie Robison
Saturday December 21^st, 2024 at 4:44 AM

AIOpsLab: Building AI agents for autonomous clouds by Brenda Potts
Saturday December 21^st, 2024 at 4:43 AM

December 2024 update of Power Automate for desktop by Yiannis Mavridis
Saturday December 21^st, 2024 at 4:43 AM