Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148776 stories
·
33 followers

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

1 Share

So much of what you interact with on GitHub depends on search—obviously the search bars and filtering experiences like the GitHub Issues page, but it is also the core of the releases page, projects page, the counts for issues and pull requests, and more. Given that search is such a core part of the GitHub platform, we’ve spent the last year making it even more durable. That means, less time spent managing GitHub Enterprise Server, and more time working on what your customers care most about. 

In recent years, GitHub Enterprise Server administrators had to be especially careful with search indexes, the special database tables optimized for searching. If they didn’t follow maintenance or upgrade steps in exactly the right order, search indexes could become damaged and need repair, or they might get locked and cause problems during upgrades. Quick context if you’re not running High Availability (HA) setups, they’re designed to keep GitHub Enterprise Server running smoothly even if part of the system fails. You have a primary node that handles all the writes and traffic, and replica nodes that stay in sync and can take over if needed.

Diagram labeled 'HA Architecture' with two boxes: 'Primary Node' and 'Replica Node.' Across both of them there exists an 'Elasticsearch Cluster' with a nested box on each node labeled 'ES Instance.' A pink arrow points from the Primary Node’s ES Instance to the Replica Node’s ES Instance, indicating replication or failover in a high-availability setup.

Much of this difficulty comes from how previous versions of Elasticsearch, our search database of choice, were integrated. HA GitHub Enterprise Server installations use a leader/follower pattern. The leader (primary server) receives all the writes, updates, and traffic. Followers (replicas) are designed to be read-only. This pattern is deeply ingrained into all of the operations of GitHub Enterprise Server.

This is where Elasticsearch started running into issues. Since it couldn’t support having a primary node and a replica node, GitHub engineering had to create an Elasticsearch cluster across the primary and replica nodes. This made replicating data straightforward and additionally gave some performance benefits, since each node could locally handle search requests. 

Diagram showing 'Primary Node' and 'Replica Node' as part of an 'Elasticsearch Cluster.' The Primary Node contains 'Primary Shard 1,' and the Replica Node contains 'Primary Shard 2.' A pink arrow points from an empty shard slot on the 'Primary Node' to Shard 2, representing the unwanted move of a primary shard to the 'Replica Node.'

Unfortunately, the problems of clustering across servers eventually began to outweigh the benefits. For example, at any point Elasticsearch could move a primary shard (responsible for receiving/validating writes) to a replica. If that replica was then taken down for maintenance, GitHub Enterprise Server could end up in a locked state. The replica would wait for Elasticsearch to be healthy before starting up, but Elasticsearch couldn’t become healthy until the replica rejoined.

For a number of GitHub Enterprise Server releases, engineers at GitHub tried to make this mode more stable. We implemented checks to ensure Elasticsearch was in a healthy state, as well as other processes to try and correct drifting states. We went as far as attempting to build a “search mirroring” system that would allow us to move away from the clustered mode. But database replication is incredibly challenging and these efforts needed consistency.

What changed?

After years of work, we’re now able to use Elasticsearch’s Cross Cluster Replication (CCR) feature to support HA GitHub Enterprise. 

“But David,” you say, “That’s replication between clusters. How does that help here?” 

I’m so glad you asked. With this mode, we’re moving to use several, “single-node” Elasticsearch clusters. Now each Enterprise server instance will operate as independent single node Elasticsearch clusters.

Diagram showing two boxes labeled 'Primary Node' and 'Replica Node.' Each box contains a dashed rectangle labeled 'Elasticsearch Instance / Cluster.' A double-headed pink arrow labeled 'Replicate Index Data (CCR)' connects the two boxes, illustrating bidirectional data replication between the primary and replica Elasticsearch clusters.

CCR lets us share the index data between nodes in a way that is carefully controlled and natively supported by Elasticsearch. It copies data once it’s been persisted to the Lucene segments (Elasticsearch’s underlying data store). This ensures we’re replicating data that has been durably persisted within the Elasticsearch cluster.

In other words, now that Elasticsearch supports a leader/follower pattern, GitHub Enterprise Server administrators will no longer be left in a state where critical data winds up on read-only nodes.

Under the hood

Elasticsearch has an auto-follow API, but it only applies to indexes created after the policy exists. GitHub Enterprise Server HA installations already have a long-lived set of indexes, so we need a bootstrap step that attaches followers to existing indexes, then enables auto-follow for anything created in the future.

Here’s a sample of what that workflow looks like:

function bootstrap_ccr(primary, replica):
  # Fetch the current indexes on each 
  primary_indexes = list_indexes(primary)
  replica_indexes = list_indexes(replica)

  # Filter out the system indexes
  managed = filter(primary_indexes, is_managed_ghe_index)
  
  # For indexes without follower patterns we need to
  #   initialize that contract
  for index in managed:
    if index not in replica_indexes:
      ensure_follower_index(replica, leader=primary, index=index)
    else:
      ensure_following(replica, leader=primary, index=index)

  # Finally we will setup auto-follower patterns 
  #   so new indexes are automatically followed
  ensure_auto_follow_policy(
    replica,
    leader=primary,
    patterns=[managed_index_patterns],
    exclude=[system_index_patterns]
  )

This is just one of the new workflows we’ve created to enable CCR in GitHub Enterprise Server. We’ve needed to engineer custom workflows for failover, index deletion, and upgrades. Elasticsearch only handles the document replication, and we’re responsible for the rest of the index’s lifecycle. 

How to get started with CCR mode 

To get started using the new CCR mode, reach out to support@github.com and let them know you’d like to use the new HA mode for GitHub Enterprise Server. They’ll set up your organization so that you can download the required license.

Once you’ve downloaded your new license, you’ll need to set `ghe-config app.elasticsearch.ccr true`. With that finished, administrators can run a `config-apply` or an upgrade on your cluster to move to 3.19.1, which is the first release to support this new architecture.  

When your GitHub Enterprise Server restarts, Elasticsearch will migrate your installation to use the new replication method. This will consolidate all the data onto the primary nodes, break clustering across nodes, and restart replication using CCR. This update may take some time depending on the size of your GitHub Enterprise Server instance.

While the new HA method is optional for now, we’ll be making it our default over the next two years. We want to ensure there’s ample time for GitHub Enterprise administrators to get their feedback in, so now is the time to try it out. 

We’re excited for you to start using the new HA mode for a more seamless experience managing GitHub Enterprise Server. 

Want to get the most out of search on your High Availability GitHub Enterprise Server deployment? Reach out to support to get set up with our new search architecture!

The post How we rebuilt the search architecture for high availability in GitHub Enterprise Server appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
13 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The Interesting Case of Flattening Mean Time to Mediocrity

1 Share

I have a complex relationship with generative AI.

On the one hand, I use it constantly and find it to be a godsend for some formerly laborious stuff.  I’ll never visit another recipe site, comb through some dusty gadget troubleshooting site, or write boilerplate code by hand as long as I live.  Read through terms and conditions?  Pfft.  “Read this and tell me if I should care.”

On the other hand, I can’t tell you how boring I find the endless river of LinkedIn thought leadership about AI.  I’m also not especially fond of prognositcation along the lines of “what if the thing that I confuse with a human, and hear me out, did the kinds of things that humans do!?”

Compelling take there, Nostradamus.

This is all to say that my take on it is that generative AI is genuinely useful.  But in my estimation, it’s only genuinely useful for a small fraction of what the breathless, incessant hypsters claim it is useful for.  I think this take is similar to Cal Newport’s in this video examing whether the tech is a “disappointment.”

Today, I’d like to zoom in on what it’s actually useful for.

Mean Time to Python Mediocrity

Cal Newport likes it for programming, and I was just reading Jonathan Stark’s daily email, and he seems to agree.  And both of these square with my experience, where just tonight I was having ChatGPT vibe code me up a Python script to aid in a regression analysis of 40ish Google Analytics instances to see if I can find significant correlation between properties of websites and share of traffic referred by answer engines.

Since I have ChatGPT generate throwaway scripts a fair bit, I can safely predict the workflow will be something like this.

  1. I tell it to generate something, but I’m too vague.
  2. I refine.
  3. It generates something.
  4. I get angry and swear at it for a few rounds before I calm down and realize I’m swearing at a probablistic function.
  5. Regrouping, I give better direction.
  6. Iterate/repeat until joy (or maybe 10% of the time I realize it’s just not up to the task).
  7. Marvel at how quickly I was able to get something up and running.
  8. Marvel at how it generates tech debt even in small applications, like a human entry level programmer.
  9. Stop marveling and move on because I’m trying to get things done.

I don’t write a lot of code these days, and I never really wrote much in Python.  So it’s generally a pretty killer use case that I can get something legitimately useful up and running less than a half hour.  And since it’s throwaway code, it really doesn’t matter the code is mediocre.

Time from 0 to medicority is less than 30 minutes.

Mean Time to Web Design Mediocrity

This metric isn’t limited to Python, either.  A few years back, I had Hit Subscribe’s head of sales at the time hire a small web dev shop to give the Hit Subscribe site a makeover.  They used some WordPress theme abomination called Divi that I, thankfully, have had minimal occasion to touch.  I’m not positive, but I think this is an original architecture design document for the WYSIWYG editor of it.

Fast forward to last fall, and I wanted to play with some site concepts.  I wanted no part of the Jenga experience of using their design GUI, so I hatched a plan.  You see, this is what the webpage looks like if you force it to render in the classic WordPress editor.  A soup of shortcodes.

I figured there were probably enough desperate souls on the internet with questions about this that ChatGPT would have decent knowledge of it in its training data.  So I described what I want and literally prompted it with “vibe code me up a DIVI page and give it to me in shortcodes I can paste in.”  And it worked surprisingly well.

Within half an hour I had something that looked alright-ish.  It was another situation where the time to mediocrity was amazingly low, thanks to Gen AI.  (As a coda, Lyndsey who does growth for Hit Subscribe and is talented with design eventually took over and created the actual, live Osiris page.  I’m not sure how much, if any, vibe-anything she used).

But we’ve got a theme here.  I’m not good at UX stuff.  I’m not good at Python scripting.  But with an LLM I can get to mediocre in minutes, rather than days.

Mean Time to Total Mediocrity

I realize that this is true across the board.  I’ve recently used ChatGPT to help fix up an old Sega Genesis, troubleshoot a firepit, and navigate various state labor law bureaucracies.  My skill level at all of those things is 0, but with the help of the LLM, I can LARP as mediocre.  And heck, with a bit of practice, maybe even ascend to mediocre in earnest.

It makes sense.  LLMs train on the wisdom of the internet, such as it is.  They then use this wisdom to predict what word the user wants to see next.  They are, essentially, an oracle that produces mediocre skill and knowledge instantly, on demand.

LLMs have thus utterly flattened society’s mean time to mediocrity.

But what does this mean, exactly?  What does it mean if anyone can immediately be mediocre at anything they want?

What does it mean if a content marketer with no programming aptitude can suddenly simulate being a mediocre programmer?  How about if a sales rep forwarding a contract can suddenly decide to be a mediocre paralegal?  How about an exec becoming a mediocre version of anyone in the org chart below them to help with micromanagement?  It’s all on the table.

At first blush, this seems like it would be largely and unambiguously positive.  We can all become “T-shaped” or “specializing generalists” or whatever management buzzword for this is poppin’ these days.  If you’re a really good widgeteer, you can still be that, while also bringing universal mediocrity in all other fields to bear, which is certainly a little better than a mix of ineptitude and novicehood that you formerly had.

Considering Possible Downsides

Or is it?

According to a recent study, this capability is starting to produce burnout.

Specifically, because “productivity” and the “variety of tasks they could tackle” increased, people took on more work.  So they take on these new responsibilities, at which they’re immediately (and unearned) skill level mediocre.  And then.. yeah, huh.  Now you’re doing your original job, plus a bunch of other ones that you’re not actually very good at and maybe aren’t quite as awesome as you originally thought when you Mary-Sued your way to mediocrity.

But if we zoom out even more and look at the bigger picture in a corporate workforce, what’s the end game here?  Are we going to perform an about-face from millenia of moving towards increased specialization of labor?  Is every content marketer going to be a programmer, every salesman a lawyer, and every executive an individual contributor?  Should everyone become mediocre at everything?

The Unclear (and Interesting) Future of Labor Specialization

This is genuinely a pretty open-ended question and musing on my part.  I don’t intend this as a rhetorical condemnation, and I’m not writing some kind of Swiftian modest proposal.  I’m earnestly curious because this capability seems both locally powerful and macroscopically limiting, so I don’t know where it goes.

And, while it’s cool for me to be able to fix Sega Genesis in my spare time with my son, I’m not really sure that medicore web design is the best of use my time in my role for Hit Subscribe.  5 years ago, I wouldn’t have attempted it.  But in 2025, I had ChatGPT in an open browser tab, practically egging me on to indulge this side quest, notwisthanding the fact that any competent management consultant would have slapped my hand before I started prompting.

For better or for worse, I’d argue that the main contribution of GenAI / LLMs to date is smashing our collective mean time to mediocrity from days, weeks, or months, to mere minutes.  What we as humanity do with our newfound and boundless mediocrity is the open question.

 

 

The post The Interesting Case of Flattening Mean Time to Mediocrity appeared first on DaedTech.

Read the whole story
alvinashcraft
24 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

4 Point Of View Choices For Writers

1 Share

Primarily, point of view is a matter of distance and in this post, we look at the 4 point of view choices for writers.

We are starting with our viewpoint/point of view (POV) series:

  1. 4 Point Of View Choices For Writers
  2. The Pros And Cons Of Writing In First Person
  3. The Pros And Cons Of Writing In Second Person
  4. The Pros & Cons Of Writing In Third Person

For most writers, viewpoint/point of view is instinctive. We tend to use the viewpoint we are most comfortable with. Look at your favourite books, the ones you tend to reread. Chances are you will write in the same point of view they are written in.

Primarily, viewpoint is a matter of distance. The closer you get to the reader the closer you are to engaging the reader’s emotions and creating a mood. Viewpoint can become overwhelming and complicated, but if you keep it simple and if you are consistent, you’ll be fine.

4 Point Of View Choices For Writers

[Buy the Viewpoint Workbook – a comprehensive 100-page guide to all things viewpoint.]

4 Point Of View Choices For Writers

The post 4 Point Of View Choices For Writers appeared first on Writers Write.

Read the whole story
alvinashcraft
44 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – March 3, 2026 (#733)

1 Share

Talked to some smart folks and starting building out a test scenario for Agent Skills. I’m continuing my extended run of not writing any code myself, but getting better at prompting my AI tooling to get what I want.

[blog] The Software Development Lifecycle Is Dead. If you read one thing today, make it this. And then reconsider everything you thought you knew.

[blog] Gemini 3.1 Flash-Lite: Built for intelligence at scale. Our fastest and most cost-effective model in the Gemini 3 family.

[blog] Secure Code Execution for the Age of Autonomous AI Agents. My colleague but an open source project to isolate MCP sessions using gVisor. Check it out.

[blog] Set Safe Defaults for Flags. Choose default values for flags that minimize the chance of a costly mistake. Sounds like good advice to me!

[article] What I learned from the book Software Engineering at Google. As mentioned in a post yesterday, coding is changing but software engineering is stable. These ideas still hold up!

[blog] Agentic Software Development: Defining The Next Phase Of AI‑Driven Engineering Tools. Diego has been on this shift for a while, and calls out the current wave. I worry about doing a lengthy vendor assessment though, as by the time he’s done it’ll all be different.

[blog] Centralized policy meets distributed logic: Getting to know Eventarc Advanced. We’re too lowkey about this service, but it’s pretty cool. This idea of “centralized policy, distributed logic” is an improvement.

[blog] Go is the Best Language for AI Agents. It’s an outstanding choice. Might be the top choice by devs before end of year. We’ll see.

[blog] Announcing the MCP Toolbox Java SDK. Bring 40+ data sources to your agentic apps with this new SDK for Java devs.

[blog] Automated Code Review: The 6-Month Evolution. Back to the first post on this list, I don’t see how you avoid this direction.

[blog] How to Kill the Code Review. OMG, it’s dead. I’m now convinced after today’s reading list 🙂

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

0.0.421

1 Share

2026-03-03

  • Autopilot permission dialog appears on first prompt submission instead of on mode switch
  • AUTO theme now reads your terminal's ANSI color palette and uses it directly, so colors match your terminal theme
  • Add structured form input for the ask_user tool using MCP Elicitations (experimental)
  • Plugin commands read extraKnownMarketplaces from project-level .claude/settings.json for Claude compatibility
  • Git hooks can detect Copilot CLI subprocesses via the COPILOT_CLI=1 environment variable to skip interactive prompts
  • Spurious "write EIO" error entries no longer appear in the timeline during session resume or terminal state transitions
  • Python-based MCP servers no longer time out due to buffered stdout
  • Error when --model flag specifies an unavailable model
  • MCP server availability correctly updates after signing in, switching accounts, or signing out
  • Display clickable PR reference next to branch name in the status bar
  • Add --plugin-dir flag to load a plugin from a local directory
  • Mouse text selection is automatically copied to the Linux primary selection buffer (middle-click to paste)
  • Fix VS Code shift+enter and ctrl+enter keybindings for multiline input
  • Use consistent ~/.copilot/pkg path for auto-update instead of XDG_STATE_HOME
  • ACP clients can configure reasoning effort via session config options
  • Click links in the terminal to open them in your default browser
  • Support repo-level config via .github/copilot/config.json for shared project settings like marketplaces and launch messages
  • Streaming output no longer truncates when running in alt-screen mode
  • Right-click paste no longer produces garbled text on Windows
  • Shell command output on Windows no longer renders as "No changes detected" in the timeline
  • GitHub API errors no longer appear as raw HTTP messages in the terminal when using the # reference picker
  • Markdown tables render with proper column widths, word wrap, and Unicode borders that adapt to terminal width
  • MCP elicitation form displays taller multi-line text input, hides tab bar for single-field forms, and fixes error flashing on field navigation
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Copilot Java SDK 1.0.10

1 Share

Installation

⚠️ Disclaimer: This is an unofficial, community-driven SDK and is not supported or endorsed by GitHub. Use at your own risk.

📦 View on Maven Central

📖 Documentation · Javadoc

Maven

<dependency>
    <groupId>io.github.copilot-community-sdk</groupId>
    <artifactId>copilot-sdk</artifactId>
    <version>1.0.10</version>
</dependency>

Gradle (Kotlin DSL)

implementation("io.github.copilot-community-sdk:copilot-sdk:1.0.10")

Gradle (Groovy DSL)

implementation 'io.github.copilot-community-sdk:copilot-sdk:1.0.10'

What's Changed

📦 Other Changes

  • Fix invalid previous_tag parameter in release workflow by @Copilot in #134
  • Remove non-existent test coverage workflow from README by @Copilot in #133
  • Upstream sync: 2 commits (8598dc3, 304d812) by @Copilot in #136
  • Upstream sync: Add clone() methods to config classes (6 commits) by @Copilot in #138
  • Update docs coverage and hooks reference by @brunoborges in #141
  • Merge upstream SDK changes (2026-02-17) by @brunoborges in #142
  • Upstream sync: clientName, deny-by-default permissions, PermissionHandler.APPROVE_ALL by @Copilot in #144
  • Upstream sync: no-op (Python-only changes, 2026-02-23) by @Copilot in #146
  • Create Achitectural Decision Record regarding SemVer policy pre 1.0 for breaking changes, such as introducing Virtual Threads. by @edburns in #149
  • Upstream sync: GitHubToken rename, sendAndWait cancellation fix, Foundry Local docs by @Copilot in #148
  • Upgrade Jackson to 2.21.1 to fix async parser DoS vulnerability (GHSA-72hv-8253-57qq) by @brunoborges in #155
  • Comply with steps sized XS or S in https://github.com/github/open-source-releases/issues/667 . by @edburns in #158
  • On branch edburns/update-license by @edburns in #159
  • Fix CompactionTest timeout caused by prompt mismatch with snapshot by @Copilot in #160
  • [upstream-sync] Port 28 upstream commits from github/copilot-sdk (f0909a7→b9f746a) by @Copilot in #157
  • Sync docs and samples with breaking session permission API changes by @Copilot in #164
  • Upstream sync: session.setModel() and built-in tool override support by @Copilot in #162
  • Restructure upstream-merge prompt flow and add explicit documentation-impact gate by @Copilot in #168
  • Document missing PR #162 CopilotSession APIs in advanced guide by @Copilot in #166

New Contributors

Full Changelog: v1.0.9...v1.0.10

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories