Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
140562 stories
·
32 followers

Microsoft Delays VS Code May 2025 Release; Pushes Ahead on Copilot Chat Open Sourcing

1 Share
Microsoft has delayed its May 2025 VS Code release while advancing plans to open source its GitHub Copilot Chat extension, part of a broader effort to position the editor as an "open source AI editor."
Read the whole story
alvinashcraft
52 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Enhancing Kubernetes Event Management with Custom Aggregation

1 Share

Kubernetes Events provide crucial insights into cluster operations, but as clusters grow, managing and analyzing these events becomes increasingly challenging. This blog post explores how to build custom event aggregation systems that help engineering teams better understand cluster behavior and troubleshoot issues more effectively.

The challenge with Kubernetes events

In a Kubernetes cluster, events are generated for various operations - from pod scheduling and container starts to volume mounts and network configurations. While these events are invaluable for debugging and monitoring, several challenges emerge in production environments:

  1. Volume: Large clusters can generate thousands of events per minute
  2. Retention: Default event retention is limited to one hour
  3. Correlation: Related events from different components are not automatically linked
  4. Classification: Events lack standardized severity or category classifications
  5. Aggregation: Similar events are not automatically grouped

To learn more about Events in Kubernetes, read the Event API reference.

Real-World value

Consider a production environment with tens of microservices where the users report intermittent transaction failures:

Traditional event aggregation process: Engineers are wasting hours sifting through thousands of standalone events spread across namespaces. By the time they look into it, the older events have long since purged, and correlating pod restarts to node-level issues is practically impossible.

With its event aggregation in its custom events: The system groups events across resources, instantly surfacing correlation patterns such as volume mount timeouts before pod restarts. History indicates it occurred during past record traffic spikes, highlighting a storage scalability issue in minutes rather than hours.

The benefit of this approach is that organizations that implement it commonly cut down their troubleshooting time significantly along with increasing the reliability of systems by detecting patterns early.

Building an Event aggregation system

This post explores how to build a custom event aggregation system that addresses these challenges, aligned to Kubernetes best practices. I've picked the Go programming language for my example.

Architecture overview

This event aggregation system consists of three main components:

  1. Event Watcher: Monitors the Kubernetes API for new events
  2. Event Processor: Processes, categorizes, and correlates events
  3. Storage Backend: Stores processed events for longer retention

Here's a sketch for how to implement the event watcher:

package main

import (
 "context"
 metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 "k8s.io/client-go/kubernetes"
 "k8s.io/client-go/rest"
 eventsv1 "k8s.io/api/events/v1"
)

type EventWatcher struct {
 clientset *kubernetes.Clientset
}

func NewEventWatcher(config *rest.Config) (*EventWatcher, error) {
 clientset, err := kubernetes.NewForConfig(config)
 if err != nil {
 return nil, err
 }
 return &EventWatcher{clientset: clientset}, nil
}

func (w *EventWatcher) Watch(ctx context.Context) (<-chan *eventsv1.Event, error) {
 events := make(chan *eventsv1.Event)

 watcher, err := w.clientset.EventsV1().Events("").Watch(ctx, metav1.ListOptions{})
 if err != nil {
 return nil, err
 }

 go func() {
 defer close(events)
 for {
 select {
 case event := <-watcher.ResultChan():
 if e, ok := event.Object.(*eventsv1.Event); ok {
 events <- e
 }
 case <-ctx.Done():
 watcher.Stop()
 return
 }
 }
 }()

 return events, nil
}

Event processing and classification

The event processor enriches events with additional context and classification:

type EventProcessor struct {
 categoryRules []CategoryRule
 correlationRules []CorrelationRule
}

type ProcessedEvent struct {
 Event *eventsv1.Event
 Category string
 Severity string
 CorrelationID string
 Metadata map[string]string
}

func (p *EventProcessor) Process(event *eventsv1.Event) *ProcessedEvent {
 processed := &ProcessedEvent{
 Event: event,
 Metadata: make(map[string]string),
 }

 // Apply classification rules
 processed.Category = p.classifyEvent(event)
 processed.Severity = p.determineSeverity(event)

 // Generate correlation ID for related events
 processed.CorrelationID = p.correlateEvent(event)

 // Add useful metadata
 processed.Metadata = p.extractMetadata(event)

 return processed
}

Implementing Event correlation

One of the key features you could implement is a way of correlating related Events. Here's an example correlation strategy:

func (p *EventProcessor) correlateEvent(event *eventsv1.Event) string {
 // Correlation strategies:
 // 1. Time-based: Events within a time window
 // 2. Resource-based: Events affecting the same resource
 // 3. Causation-based: Events with cause-effect relationships

 correlationKey := generateCorrelationKey(event)
 return correlationKey
}

func generateCorrelationKey(event *eventsv1.Event) string {
 // Example: Combine namespace, resource type, and name
 return fmt.Sprintf("%s/%s/%s",
 event.InvolvedObject.Namespace,
 event.InvolvedObject.Kind,
 event.InvolvedObject.Name,
 )
}

Event storage and retention

For long-term storage and analysis, you'll probably want a backend that supports:

  • Efficient querying of large event volumes
  • Flexible retention policies
  • Support for aggregation queries

Here's a sample storage interface:

type EventStorage interface {
 Store(context.Context, *ProcessedEvent) error
 Query(context.Context, EventQuery) ([]ProcessedEvent, error)
 Aggregate(context.Context, AggregationParams) ([]EventAggregate, error)
}

type EventQuery struct {
 TimeRange TimeRange
 Categories []string
 Severity []string
 CorrelationID string
 Limit int
}

type AggregationParams struct {
 GroupBy []string
 TimeWindow string
 Metrics []string
}

Good practices for Event management

  1. Resource Efficiency

    • Implement rate limiting for event processing
    • Use efficient filtering at the API server level
    • Batch events for storage operations
  2. Scalability

    • Distribute event processing across multiple workers
    • Use leader election for coordination
    • Implement backoff strategies for API rate limits
  3. Reliability

    • Handle API server disconnections gracefully
    • Buffer events during storage backend unavailability
    • Implement retry mechanisms with exponential backoff

Advanced features

Pattern detection

Implement pattern detection to identify recurring issues:

type PatternDetector struct {
 patterns map[string]*Pattern
 threshold int
}

func (d *PatternDetector) Detect(events []ProcessedEvent) []Pattern {
 // Group similar events
 groups := groupSimilarEvents(events)

 // Analyze frequency and timing
 patterns := identifyPatterns(groups)

 return patterns
}

func groupSimilarEvents(events []ProcessedEvent) map[string][]ProcessedEvent {
 groups := make(map[string][]ProcessedEvent)

 for _, event := range events {
 // Create similarity key based on event characteristics
 similarityKey := fmt.Sprintf("%s:%s:%s",
 event.Event.Reason,
 event.Event.InvolvedObject.Kind,
 event.Event.InvolvedObject.Namespace,
 )

 // Group events with the same key
 groups[similarityKey] = append(groups[similarityKey], event)
 }

 return groups
}


func identifyPatterns(groups map[string][]ProcessedEvent) []Pattern {
 var patterns []Pattern

 for key, events := range groups {
 // Only consider groups with enough events to form a pattern
 if len(events) < 3 {
 continue
 }

 // Sort events by time
 sort.Slice(events, func(i, j int) bool {
 return events[i].Event.LastTimestamp.Time.Before(events[j].Event.LastTimestamp.Time)
 })

 // Calculate time range and frequency
 firstSeen := events[0].Event.FirstTimestamp.Time
 lastSeen := events[len(events)-1].Event.LastTimestamp.Time
 duration := lastSeen.Sub(firstSeen).Minutes()

 var frequency float64
 if duration > 0 {
 frequency = float64(len(events)) / duration
 }

 // Create a pattern if it meets threshold criteria
 if frequency > 0.5 { // More than 1 event per 2 minutes
 pattern := Pattern{
 Type: key,
 Count: len(events),
 FirstSeen: firstSeen,
 LastSeen: lastSeen,
 Frequency: frequency,
 EventSamples: events[:min(3, len(events))], // Keep up to 3 samples
 }
 patterns = append(patterns, pattern)
 }
 }

 return patterns
}

With this implementation, the system can identify recurring patterns such as node pressure events, pod scheduling failures, or networking issues that occur with a specific frequency.

Real-time alerts

The following example provides a starting point for building an alerting system based on event patterns. It is not a complete solution but a conceptual sketch to illustrate the approach.

type AlertManager struct {
 rules []AlertRule
 notifiers []Notifier
}

func (a *AlertManager) EvaluateEvents(events []ProcessedEvent) {
 for _, rule := range a.rules {
 if rule.Matches(events) {
 alert := rule.GenerateAlert(events)
 a.notify(alert)
 }
 }
}

Conclusion

A well-designed event aggregation system can significantly improve cluster observability and troubleshooting capabilities. By implementing custom event processing, correlation, and storage, operators can better understand cluster behavior and respond to issues more effectively.

The solutions presented here can be extended and customized based on specific requirements while maintaining compatibility with the Kubernetes API and following best practices for scalability and reliability.

Next steps

Future enhancements could include:

  • Machine learning for anomaly detection
  • Integration with popular observability platforms
  • Custom event APIs for application-specific events
  • Enhanced visualization and reporting capabilities

For more information on Kubernetes events and custom controllers, refer to the official Kubernetes documentation.

Read the whole story
alvinashcraft
52 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

We've Finally Reached the End of the Road For Intel Macs

1 Share
MacOS Tahoe will be the final version of macOS to support some specific Intel-based Macs.
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

I Hit “Apply” 142 Times—and Heard Back Twice

1 Share

Last spring, I spent more nights on LinkedIn Jobs than on Netflix. One role even dragged me through four rounds of interviews before ghosting me completely.

\ Somehow, the bar keeps rising—referral? Video intro? Homework project?—and still we’re told “just follow up.” After the 30-hour week called “full-time recruiting,” I rarely had energy left to chase down recruiters’ emails, let alone write something thoughtful.

\ So, I built a shortcut for myself.


The Idea in Plain English

One click = one polite follow-up.

  • Click a little button in Chrome right after submitting an application.
  • The extension grabs the company name and role.
  • Behind the curtain, it drafts a short email and tries to find a real recruiter’s address.
  • If it finds one, it sends; if not, it gives me a copy-paste draft.

\ No tabs, no digging through Google, no wondering what to say.


What I Actually Did (Non-Engineer Edition)

  1. Googled “how to make a Chrome extension.” Copied the starter template, swapped icons, and added a textbox for my webhook link.

    \

  2. Hooked it to Make.com. Think of Make as Lego blocks for the internet—drag this, drop that, tell it “when you see X, do Y.”

    \

  3. Asked ChatGPT to write the email. Prompt: “In 120 words or less, polite, mention the job title, express genuine interest.”

    \

  4. Pulled recruiter info from Apollo. Free tier—best thing for a student budget.

    \

  5. Tested on myself. It worked.


Why Bother Following Up Anyway?

  • Silence ≠ rejection. Recruiters sift through hundreds of apps; the polite ping helps them remember you.
  • It shows you care. If two candidates look equal on paper, the one who reaches out usually wins.
  • It builds a real contact list. Even if you don’t land this role, you now know a human at that company.

Honest Lessons (So Far)

  • Friction kills good intentions. If I have to copy-paste anything, I’ll put it off until “tomorrow,” which never comes.
  • Side projects are therapy. Shipping something—even half-baked—felt better than refreshing my inbox for the 100th time.

What’s Next (After I Catch My Breath)

  • Clean UI, no setup screens (my friends will never paste a webhook, and I can’t blame them).
  • Toggle for internship vs. full-time tone. Same click, slightly different voice.
  • Open-source once it’s not embarrassing. I’ll post the repo before fall recruiting hits.

Build With Me

If you’re a fellow builder looking for a teammate, I’d love to collaborate:

JavaScript tinkerers who know their way around Chrome Extensions \n ➤No-code automators ready to stretch Make.com in new ways \n ➤Prompt-crafting nerds who can wring the perfect tone from ChatGPT

Ping me on LinkedIn (Parviz Sadikov) or email sadikov@uw.edu

\

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Node v24.2.0 (Current)

1 Share
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Coffee and Open Source Conversation - Shawn Wildermuth

1 Share
From: Isaac Levin
Duration: 1:10:41
Views: 3

Shawn Wildermuth has been tinkering with computers and software since he got a Vic-20 back in the early '80s. He has been a Microsoft MVP since 2003. You may have taken one of his more than twenty-five courses on Pluralsight. He's authored eight books and innumerable articles on software development. You can also see him at one of the local and international conferences he's spoken at including Techorama, KCDC, Stir Trek, TechBash, DevSum, and DevConnections.

You can follow Shawn on Social Media
https://wildermuth.com/en/
https://twitter.com/shawnwildermuth
https://linkedin.com/in/shawnwildermuth
https://www.youtube.com/c/swildermuth
https://github.com/shawnwildermuth
https://bsky.app/profile/wildermuth.com
https://fosstodon.org/@shawnwildermuth

PLEASE SUBSCRIBE TO THE PODCAST

- Spotify: http://isaacl.dev/podcast-spotify
- Apple Podcasts: http://isaacl.dev/podcast-apple
- Google Podcasts: http://isaacl.dev/podcast-google
- RSS: http://isaacl.dev/podcast-rss

You can check out more episodes of Coffee and Open Source on https://www.coffeeandopensource.com

Coffee and Open Source is hosted by Isaac Levin (https://twitter.com/isaacrlevin)

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories