Put a capable coding model inside a developer’s primary workspace, and the IDE stops being a place where you write code. It becomes a place where you direct an agent, watch how it reasons, manage what it pays attention to, and decide when its output is worth shipping. That was the defining theme of the inaugural JetBrains x Codex Hackathon: across roughly 40 submissions over a single weekend, teams explored what it actually means to build with AI natively inside the IDE – not bolted on top of it. The six finalists came up with some of the most compelling answers.
🥇 First Place: hyperreasoning – Aditya Mangalampalli
Most coding agents call the model once and hope for the best. As Aditya puts it: “LLMs spend a lot of time thinking in circles.” Hyperreasoning replaces the single shot with something closer to a search: the system drafts several possible approaches to a task, then a learned controller decides which to expand, which to cut, and which to verify against tests. Compiler errors and failing tests feed back into how the controller weighs its options.
Inside the IDE, a tool window renders the search live, so you can watch which paths the controller explored before settling on one. The argument the project makes is that a smaller local model wrapped in this kind of verified search loop can hold its own against much larger frontier models at meaningfully lower cost — with the IDE serving as the place where reasoning becomes visible and directable, rather than a black box that returns code.
🥈 Second Place: Scopecreep – Bhavik Sheoran, Kenneth Ross, Roman Javadyan, Joon Im
Hardware bring-up is a tool-juggling exercise: schematic viewer in one window, vendor apps for the oscilloscope and power supply in others, a terminal talking to the device, a spreadsheet collecting results. Scopecreep collapses that into a single JetBrains tool window. Hand it a circuit schematic and an agent works through testing the board – picking signals worth measuring, capturing the readings, and producing a report.
The design choice worth noticing: when the agent decides a probe needs to be placed, the session pauses and shows the engineer exactly where to put it. The engineer places the probe physically and clicks Resume. It’s the right call for real instruments on a real bench – autonomous, where a computer can be trusted, human-in-the-loop, where the work touches the physical world.
🥉 Third Place: mesh-code – Ayush Ojha, Coco Cao, Kush Ise, AL DRAM
Switch machines mid-task, and your coding agent starts over. mesh-code fixes that by giving agents shared memory of an in-progress project – what’s been tried, what’s been decided, what’s still pending – so a session that begins on one laptop can continue from another, with whichever agent happens to be available. Codex is one of the agents that can plug in.
Latent Signal – Periscope
Long agent sessions accumulate dead weight: tool outputs nobody needs anymore, dead ends, context that was useful ten turns ago and isn’t now. Periscope, built on Wes McKinney’s open-source agentsview, is a JetBrains plugin that shows what’s actually filling up an agent’s working memory turn by turn – and recommends what to do about it, whether that’s continuing, rewinding to a better branching point, compacting, forking, or handing off entirely. It works with Codex and most other coding agents, and everything stays local.
SecureLoop – Abhiram Sribhashyam, Rahul Marri, Peyton Li
Security incident response is still mostly copy-paste: stack trace into a chat window, repo context explained by hand, a fix written and committed in the hope it’s safe. SecureLoop turns that into a controlled loop inside JetBrains. When something breaks in production, the agent gathers the relevant code, the project’s security rules, and the state of its dependencies, then asks Codex for a structured diagnosis and a proposed fix. That fix runs through automated checks before any pull request opens.
The PR opens automatically. The merge does not. SecureLoop surfaces everything that informed the decision – the diff, the policy it bumped into, the test that proved the patch – inside the IDE for the developer to approve or reject. As the team put it: “Codex fully makes the PR ready for you, and it remains human-in-the-loop where you have to approve or deny.”
The team’s bigger thesis is a security-policy.md file that lives in the repo alongside README.md, spelling out a project’s specific rules for handling secrets, errors, and risky patterns. Coding agents read it before suggesting changes, so the question stops being “what’s a good fix?” and becomes “what’s an acceptable fix under this codebase’s rules?”
Pinpoint – Het Patel
Frontend feedback delivered through a chat window is unavoidably vague. “Move that element” or “change that color” leaves the agent guessing which element you actually mean. Pinpoint takes that piece of the ambiguity off the table: developers drop pins directly on a live page, attach a comment to each, and send the whole batch to the agent with precise on-page context attached. The agent now knows exactly which element you meant – even if it still has to figure out what change you want.
The project ships in two pieces: one for annotating web pages in a browser, and a desktop companion for marking up anything visible on screen – useful when the interface in question isn’t a web page.
What the finalists show
Looking across these six projects, a clear pattern emerges. Codex embedded in the IDE isn’t just a faster way to write code – it’s a reasoning layer you can watch think, a structured output engine you can direct, a participant in workflows that span hardware instruments, production alerts, shared session state, and context windows. And the IDE becomes the place where all of that comes together: visible, controllable, and version-controlled.
That’s the possibility these teams spent a weekend proving out, and it’s only the beginning.
Phishing campaigns continue to improve sophistication and refinement in blending social engineering, delivery and hosting infrastructure, and authentication abuse to remain effective against evolving security controls. A large-scale credential theft campaign observed by Microsoft Defender Research exemplifies this trend, using code of conduct-themed lures, a multi-step attack chain, and legitimate email services to distribute fully authenticated messages from attacker-controlled domains.
The campaign targeted tens of thousands of users, primarily in the United States, and directed them through several stages of CAPTCHA and intermediate staging pages designed to reinforce legitimacy while filtering out automated defenses. The lures in this campaign used polished, enterprise-style HTML templates with structured layouts and preemptive authenticity statements, making them appear more credible than typical phishing emails and increasing their plausibility as legitimate internal communications. Because the messages contained concerning accusations and repeated time-bound action prompts, the campaign created a sense of urgency and pressure to act.
The attack chain ultimately led to a legitimate sign-in experience that was part of an adversary‑in‑the‑middle (AiTM) phishing flow, which allowed the attackers to proxy the authentication session and capture authentication tokens that could provide immediate account access. Unlike traditional credential harvesting, AiTM attacks intercept authentication traffic in real time, bypassing non-phishing-resistant multifactor authentication (MFA).
In this blog, we’re sharing our analysis of this campaign’s lures, infrastructure, and techniques. Organizations can defend against financial fraud initiated through phishing emails by educating users about phishing lures, investing in advanced anti-phishing solutions like Microsoft Defender for Office 365 and configuring essential email security settings, and encouraging users to employ web browsers that support SmartScreen. Organizations can also enable network protection, which lets Windows use SmartScreen as a host-based web proxy.
Multi-step social engineering campaign leading to credential theft
Between April 14 and 16, 2026, the Microsoft Defender Research team observed a series of sophisticated phishing campaigns targeting more than 35,000 users across over 13,000 organizations in 26 countries, with majority of targets located in the United States (92%). The campaign did not focus on a single vertical but instead impacted a broad range of industries, most notably Healthcare & life sciences (19%), Financial services (18%), Professional services (11%), and Technology & software (11%). Messages were distributed in multiple distinct waves between 06:51 UTC on April 14 and 03:54 UTC on April 16.
Figure 1. Timeline of campaign messages sent by hourFigure 2. Campaign recipients by country and industry
Emails in this campaign posed as internal compliance or regulatory communications, using display names such as “Internal Regulatory COC”, “Workforce Communications”, and “Team Conduct Report”. Subject lines included “Internal case log issued under conduct policy” and “Reminder: employer opened a non-compliance case log”.
Message bodies claimed that a “code of conduct review” had been initiated, referenced organization-specific names embedded within the text, and instructed recipients to “open the personalized attachment” to review case materials. At the top of each message, a notice stated that the message had been “issued through an authorized internal channel” and that links and attachments had been “reviewed and approved for secure access”, reinforcing the email’s purported legitimacy. To further support the confidentiality of the supposed review, the end of each message contained a green banner stating that the contents had been encrypted using Paubox, a legitimate service associated with HIPAA-compliant communications.
Figure 3. Sample phishing email
Analysis of the sending infrastructure indicated that the campaign emails were sent using a legitime email delivery service, likely originating from a cloud-hosted Windows virtual machine. The messages were sent from multiple sender addresses using domains that are likely attacker-controlled.
Each campaign email included a PDF attachment with filenames such as Awareness Case Log File – Tuesday 14th, April 2026.pdf and Disciplinary Action – Employee Device Handling Case.pdf. The attachment provided additional context about the supposed conduct review, including a summary of the review process and instructions for accessing supporting documentation. Recipients were directed to click a “Review Case Materials” link within the PDF, which initiated the credential harvesting flow.
Figure 4. PDF attachment
When clicked, users were initially directed to one of two attacker-controlled domains (for example, acceptable-use-policy-calendly[.]de or compliance-protectionoutlook[.]de). These landing pages displayed a Cloudflare CAPTCHA, presented as a mechanism to validate that the user was coming “from a valid session”. This CAPTCHA likely served as a gating mechanism to impede automated analysis and sandbox detonation.
Figure 5. CAPTCHA challenge
After completing the CAPTCHA, users were redirected to an intermediate site designed to prepare them for the final stage of the attack. This page informed users that the requested documentation was encrypted and required account authentication. While this stage of the attack has several hallmarks of device code phishing, we were only able to confirm the AITM portion of the attack chain.
Figure 6. Intermediate site asking users to click “Review & Sign”
After clicking the provided “Review & Sign” button, users were presented with a sign-in prompt requesting their email address.
Figure 7. Prompt directing users to enter their email address
After submission, users were required to complete a second CAPTCHA involving image selection.
Figure 8. Second CAPTCHA challenge
Once these steps were completed, users were shown a message indicating that verification was successful and that their “case” was being prepared.
Figure 9. Message telling users that “Verification completed successfully”
Following these steps, users were redirected to a third site hosting the final stage of the attack. Analysis of the underlying code indicates that the final destination varied depending on whether the user accessed the workflow from a mobile device or a desktop system.
Figure 10. Code used to redirect users based on platform
On the final page, users were informed that all materials related to their code of conduct review had been “securely logged”, “time-stamped”, and “maintained within the organization’s centralized compliance tracking system”. They were then prompted to schedule a time to discuss the case, which required signing in to their account.
Figure 11. Final page instructed users to sign in
Selecting the “Sign in with Microsoft” option redirected users to a Microsoft authentication page, initiating an AiTM session hijacking flow designed to capture authentication tokens and compromise user accounts.
Mitigation and protection guidance
Microsoft recommends the following mitigations to reduce the impact of this threat. Check the recommendations card for the deployment status of monitored mitigations.
Review the recommended settings for Exchange Online Protection and Microsoft Defender for Office 365 to ensure your organization has established essential defenses and knows how to monitor and respond to threat activity.
Invest in user awareness training and phishing simulations. Attack simulation training in Microsoft Defender for Office 365, which also includes simulating phishing messages in Microsoft Teams, is one approach to running realistic attack scenarios in your organization.
Enable Zero-hour auto purge (ZAP) in Defender for Office 365 to quarantine sent mail in response to newly acquired threat intelligence and retroactively neutralize malicious phishing, spam, or malware messages that have already been delivered to mailboxes.
Encourage users to use Microsoft Edge and other web browsers that support Microsoft Defender SmartScreen, which identifies and blocks malicious websites, including phishing sites, scam sites, and sites that host malware.
Enable password-less authentication methods (for example, Windows Hello, FIDO keys, or Microsoft Authenticator) for accounts that support password-less. For accounts that still require passwords, use authenticator apps like Microsoft Authenticator for multifactor authentication (MFA). Refer to this article for the different authentication methods and features.
Configure automatic attack disruption in Microsoft Defender XDR. Automatic attack disruption is designed to contain attacks in progress, limit the impact on an organization’s assets, and provide more time for security teams to remediate the attack fully.
Microsoft Defender detections
Microsoft Defender customers can refer to the list of applicable detections below. Microsoft Defender coordinates detection, prevention, investigation, and response across endpoints, identities, email, apps to provide integrated protection against attacks like the threat discussed in this blog.
Tactic
Observed activity
Microsoft Defender coverage
Initial access
Phishing emails
Microsoft Defender for Office 365 – A potentially malicious URL click was detected – A user clicked through to a potentially malicious URL – Suspicious email sending patterns detected – Email messages containing malicious URL removed after delivery – Email messages removed after delivery – Email reported by user as malware or phish
Persistence
Threat actors sign in with stolen valid entities
Microsoft Entra ID Protection – Anomalous Token – Unfamiliar sign-in properties – Unfamiliar sign-in properties for session cookies
Microsoft Security Copilot is embedded in Microsoft Defender and provides security teams with AI-powered capabilities to summarize incidents, analyze files and scripts, summarize identities, use guided responses, and generate device summaries, hunting queries, and incident reports.
Security Copilot is also available as a standalone experience where customers can perform specific security-related tasks, such as incident investigation, user analysis, and vulnerability impact assessment. In addition, Security Copilot offers developer scenarios that allow customers to build, test, publish, and integrate AI agents and plugins to meet unique security needs.
Threat intelligence reports
Microsoft Defender XDR customers can use the following threat analytics reports in the Defender portal (requires license for at least one Defender XDR product) to get the most up-to-date information about the threat actor, malicious activity, and techniques discussed in this blog. These reports provide the intelligence, protection information, and recommended actions to prevent, mitigate, or respond to associated threats found in customer environments.
Microsoft Security Copilot customers can also use the Microsoft Security Copilot integration in Microsoft Defender Threat Intelligence, either in the Security Copilot standalone portal or in the embedded experience in the Microsoft Defender portal to get more information about this threat actor.
Hunting queries
Microsoft Defender XDR customers can run the following advanced hunting queries to find related activity in their networks:
Campaign emails by sender address
The following query identifies emails associated with this campaign using a message’s sending email address.
EmailEvents
| where SenderMailFromAddress in (" cocpostmaster@cocinternal.com "," nationaladmin@gadellinet.com ","
nationalintegrity@harteprn.com”,” m365premiumcommunications@cocinternal.com”,” documentviewer@na.businesshellosign.de”)
Indicators of compromise
Indicator
Type
Description
First seen
Last seen
compliance-protectionoutlook[.]de
Domain
Domain hosting malicious campaign content
2026-04-14
2026-04-16
acceptable-use-policy-calendly[.]de
Domain
Domain hosting malicious campaign content
2026-04-14
2026-04-16
cocinternal[.]com
Domain
Domain hosting sender email address
2026-04-14
2026-04-16
Gadellinet[.]com
Domain
Domain hosting sender email address
2026-04-14
2026-04-16
Harteprn[.]com
Domain
Domain hosting sender email address
2026-04-14
2026-04-16
Cocpostmaster[@]cocinternal.com
Email address
Email address used to send campaign emails
2026-04-14
2026-04-16
Nationaladmin[@]gadellinet.com
Email address
Email address used to send campaign emails
2026-04-14
2026-04-16
Nationalintegrity[@]harteprn.com
Email address
Email address used to send campaign emails
2026-04-14
2026-04-16
M365premiumcommunications[@]cocinternal.com
Email address
Email address used to send campaign emails
2026-04-14
2026-04-16
Documentviewer[@]na.businesshellosign.de
Email address
Email address used to send campaign emails
2026-04-14
2026-04-16
Awareness Case Log File – Monday 13th, April 2026.pdf
Filename
Name of PDF attachment containing phishing link
2026-04-14
2026-04-14
Awareness Case Log File – Tuesday 14th, April 2026.pdf
Filename
Name of PDF attachment containing phishing link
2026-04-15
2026-04-15
Awareness Case Log File – Wednesday 15th, April 2026.pdf
To hear stories and insights from the Microsoft Threat Intelligence community about the ever-evolving threat landscape, listen to the Microsoft Threat Intelligence podcast.
OpenClaw, one of the fastest-growing open source projects, has already picked up over 350,000 stars and an early community of builders exploring what agentic systems can actually do in practice.
This evening is a chance to bring the OpenClaw community together into the same room.
We’ll kick things off in the early evening with a fireside conversation featuring Peter Steinberger, the ClawFather and creator of OpenClaw, followed by a panel with OpenClaw maintainers and ecosystem builders sharing what’s working—and what’s not—when shipping real agentic systems.
Later in the evening, we’ll move into a series of fast-paced lightning talks and close things out with a relaxed happy hour to connect with other builders.
If you have been following the project or building with it yourself, this is a good chance to meet others, trade notes, and get your claws into what people are actually shipping.
👉 For the full agenda and speaker lineup, please see the registration page.
📍 GitHub HQ, 275 Brannan St., San Francisco 🗓 June 3, 5:30 p.m. – 9 p.m. 📺 Livestream: twitch.tv/github
Drinks and snacks will be provided. There will be a lot here to chew on. No shellfish behavior please. And bring your sharp ideas!
Spots are limited, so register early and come ready to share what you are working on.
‼️ Please note:Submitting a registration does not guarantee attendance. We’ll follow up to confirm successful registrations.
OpenClaw is an open source framework for building and running agentic systems, focused on giving developers real control over how agents execute tasks in the wild. It provides the core pieces for orchestrating tools, managing state, and handling long running workflows, so you can move beyond prompt demos and ship systems that actually do work. It’s also probably convinced more than a few people to buy a Mac Mini just to run “one small experiment” that somehow turned into a permanent setup.
Hear more about OpenClaw from the creator himself, Peter Steinberger:
We ran the same coding tasks with and without prebundled tooling, across multiple models and languages. Here’s what changed.
Eval-driven development
IDE-native search reduced latency, cost, and budget overruns.
The comparison below uses paired task-level deltas. Aggregate medians and totals are shown for orientation. Budget overruns are tasks that exceeded the USD 0.50 per-task cap.
8.33%Median latency reduced83.11s → 79.03s
16.44%P95 latency reduced268.71s → 213.17s
5.60%Total cost reducedUSD 44.17 → USD 41.67
33.28%Budget overruns reduced6.67% → 4.44%
Why We Built This
When coding agents search code, they default to shell tools. grep and find work, but they’re blind to project structure, symbol boundaries, and language semantics. The agent burns tokens sifting through noisy output and making follow-up calls to narrow things down.
So we tried something obvious: what if the agent could use the IDE’s own search instead?
We built a prebundled skill that pairs a search prompt with a unified MCP tool. One tool, four modes: file search, text search, regex, and symbol lookup. A universal router dispatches calls to the right backend.
MCP Tools
Functions the agent calls via an MCP server during task execution. IDE-native tools can tap into indices, ASTs, and project models that shell tools cannot see.
Skills
Packaged agent behaviors: a prompt plus orchestration logic. A skill can work on its own, use tools, or ship bundled with the tools it needs.
Nothing ships by default until the eval says it should. We tested four different configurations of this tooling before picking one.
Methodology
The eval pipeline spins up an MCP server alongside the IDE so the agent has access to the configured tools and skills. We run identical coding tasks with and without tooling, then compare with paired delta analysis.
We track four things: quality, latency, cost, and budget discipline. Quality asks whether all tests passed. Latency tracks median and P95 task time. Cost converts token consumption into dollars. Budget discipline tracks how often a single task exceeds the USD 0.50 budget cap.
We report improvement deltas only when they pass our significance threshold: p < 0.05, paired test with 95% confidence intervals. Metrics without a significant change are either omitted from the charts or called out explicitly. We tried four configuration variants, selected the one with the best latency and cost tradeoff, then re-ran it on different models and languages to check that the results held.
Eval frame
Same tasks, same grading, one controlled difference.
QualityAll-tests-passed rate, checked before performance claims.
LatencyMedian and P95 task duration, compared with paired deltas.
CostToken use converted to dollars across the task set.
Budget disciplineShare of tasks exceeding the USD 0.50 single-task cap.
Results
The selected configuration was a prebundled search skill plus a unified IDE-native tool and universal router. Compared with the no-tooling baseline, it reduced latency and cost without producing a statistically significant quality change.
Baseline vs. tooling
Absolute metrics moved in the right direction.
Median latency
Baseline83.11s
With tooling79.03s
P95 latency
Baseline268.71s
With tooling213.17s
Total cost
BaselineUSD 44.17
With toolingUSD 41.67
Budget overruns
Baseline6.67%
With tooling4.44%
Budget overruns
33.28%
P95 latency
16.44%
Median latency
8.33%
Total cost
5.60%
No statistically significant change in quality. All shown deltas passed the significance threshold.
Trace snapshots
The difference is visible in the agent’s path through the project.
These are shortened traces from cases that improved in both time and cost. The baseline spends more steps discovering context; the prebundled setup gets to the relevant files faster.
Service comments and replies
promptUpdate service and controller layers for comments and replies.before: no prebundled IDE searchagent> list files -> search x2 -> list files x2agent> jar inspect x5 -> javap -> jar inspect -> javap x5agent> curl download -> decompile -> search -> find files x2agent> read 9 files -> edit file x8 -> respondtime: 472safter: prebundled skill and unified searchagent> read SKILL.md -> search x3 -> read 5 filesagent> read FeatureController.java -> read 4 filesagent> edit file x2 -> respondtime: 127s
We tested four tool configurations before choosing the final shape. Lower latency and lower total cost are better, so the lower-left corner of the plot is the target.
Configuration search
The selected option had the best latency while preserving cost reduction.
Median latency, 78s to 84sTotal cost, USD 39.50 to USD 45.00
We re-ran the experiment with GPT 5.4 on Java and Kotlin codebases. The pattern holds: latency and cost both drop. Kotlin saw the biggest cost improvement, with total cost falling 13.48%.
Cross-model check
The effect held beyond the original run.
Codex 5.2
Median latency8.33%
Total cost5.60%
P95 latency16.44%
GPT 5.4, Java
Median latency3.75%
Total cost4.07%
P95 latency13.00%
GPT 5.4, Kotlin
Median latency6.92%
Total cost13.48%
P95 latencynot significant
Missing bars mean that metric was not statistically significant for that model and language.
How Models Adopt Tooling
Codex sends 91% of its search calls through the new IDE-native tool. Claude is a different story: Opus uses it for about half its searches, and Haiku only 28%, preferring grep and find instead.
This makes sense. Claude already has strong built-in code search, so it leans on what it knows. Codex doesn’t, so it grabs the better tool when one is available. The takeaway: prebundled tooling fills gaps. Where the model already has good search, it adds less. Where search is weak, it makes a real difference.
Tool adoption
Models do not use new tools at the same rate.
Codex
9181
Claude Opus
532819
Claude Haiku
283339
IDE Searchgrepfind
What’s Next
The eval pipeline works. Now we’re using it.
We’re running the same experiment on smaller models next. Our hunch is that they’ll benefit even more, since they have less built-in search capability to fall back on.
The current results are strongest on Java and Kotlin. We’re expanding to Python, .NET, and TypeScript with bigger sample sizes.
Meanwhile, the winning configuration is being prepared for the integrated IntelliJ IDEA MCP Server, so agent sessions can use IDE-native tooling when the server is enabled.
The next step is to turn this feature on by default in upcoming AI Assistant plugin updates.
Want to try it before the default rollout?
Set these registry keys to true: llm.chat.agent.codex.mcp.idea, llm.chat.agent.skills.settings.enabled, and llm.agents.contrib.bundled.skills.sync.enabled.
In AI Assistant, choose Codex for the best results.
Ask the agent to find something across the current project.
My hypothesis this year around AI was that if I develop some agent skills to speed up repeatable processes, it might clear up my bandwidth and free up time for me to work on non-repeatable doc tasks. It appears to be working.