Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
146278 stories
·
33 followers

All the ways TikTok is broken: here’s what’s real and what’s not

1 Share

While social media rumors have suggested the errors are examples of censorship, more than a day after the issues began, TikTok USDS says the problems are the result of a power outage at a data center that it is working to resolve. Rumors of censorship targeting anti-ICE protesting or attempting to block discussion of Jeffrey Epstein appear to be misguided (even the governor of California is resharing misinformation posted on Twitter by “intelligentpawg” and PopBase), with problems and glitches blocking all kinds of videos and messages on the service through Monday night.

Starting early Sunday morning, TikTok’s now under new ownership US arm started breaking down just a couple of days after Oracle & Co took the reins. Its For You page algorithm is suddenly unreliable, while features like comments are failing to load or loading slowly, and publishing new videos seems nearly impossible for many people.

Read on below for the latest updates about the ongoing TikTok problems, as we try to sift through social media claims to see which ones are true and which ones are just more ragebait.

Read the whole story
alvinashcraft
51 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How Does This Google Tool Turns Messy Text Into Clean Data

1 Share

Have you ever found yourself wrestling with the Herculean task of converting unruly, unstructured data from emails, PDFs, or transcripts into a well-organized format? Like many, you might think the crux of the problem lies in the development of the application itself, but let’s dive deeper – it’s really about tackling the chaotic nature of text itself. This is the battlefield where most data processing efforts crumble.

In the world of software development, the common recourse might involve harnessing more advanced roles or deeper natural language processing (NLP). However, the truly innovative shifts often come from thinking outside the box—or in this case, from a tool that’s flying under the radar yet making substantial waves: Lang Extract. This tool is a testament to the evolving landscape of data extraction, and I recently dove into its capabilities to see why it’s capturing the interest of developers at such a rapid pace.

This video is from Better Stack.

At its core, Lang Extract might appear as your typical extraction library. However, it distinguishes itself by leveraging large language models (LLMs) like Gemini or GPT to mine structured data from the messiest of texts. Thinking about this process, imagine not just pulling out entities, attributes, and relationships from a given text but organizing this information into clean outputs such as JSON or interactive HTML. The magic, however, doesn’t stop there. What truly sets Lang Extract apart is its ability to anchor every piece of extracted information back to its origins in the text. This feature is a game-changer because it allows developers to trace the data back to the exact snippet of text from where it was derived. This transparency is crucial for verification, providing a significant layer of trust and reliability over traditional methods.

Let me break down the workflow for you. It begins quite simply: you issue a prompt, the extraction occurs, and what you receive in return is a structured output that can be meticulously verified. During one of my explorations, I delved into a project involving clinical notes. The notes were initially just blobs of text to a computer—meaningless without human interpretation. After setting up Lang Extract with my Gemini API key and firing up a Python script, I described what I needed in my prompt, which included the entities, attributes, and relationships pertinent to my query. Remarkably, there was no need for training data or model fine-tuning.

The result was a structured JSON output, where each piece of extracted data linked back to the exact sentence from which it was taken. This direct correlation is invaluable for reviewing, debugging, or explaining processes without second-guessing or assumptions. One of the most impressive features I discovered was the interactive HTML page generated by Lang Extract, which allows users to click on an entity and see it highlighted in the original text. This isn’t just a neat feature for visualization; it’s a robust tool for comprehensive data audits and reviews.

Why are developers drifting away from traditional NLP to embrace tools like Lang Extract? The answer lies in the practical challenges presented by messy, unstructured texts. They’re not just annoying – they are costly in terms of time and resources and often become the weak link in otherwise efficient systems. Lang Extract addresses these issues head-on by enhancing accuracy and traceability, crucial for applications in fields like healthcare or finance where compliance and precision are paramount.

The advantages of using Lang Extract extend beyond mere data extraction: the setup is straightforward, and it ensures outputs are grounded. This reduces the typical trust issues associated with LLMs since everything extracted can be verified directly against the source text. Additionally, this tool can handle extensive documents more adeptly than many other available tools and fits seamlessly into both local and cloud-based environments. On the flip side, while it offers numerous benefits, potential users should be aware of issues like LLM costs at scale and limitations with highly noisy texts. Moreover, Lang Extract is primarily Python-based, which might present a learning curve for non-Python developers.

In conclusion, tools like Lang Extract are pivotal in lowering the barriers to working with complex unstructured data. They not only simplify the extraction process but also provide a level of trust in the outputs—essential in a data-driven world where precision can make or break the efficacy of information systems. If your work revolves around handling data, Lang Extract might just be the ace up your sleeve, enabling you to turn what was once a daunting challenge into a streamlined, efficient process.

Read the whole story
alvinashcraft
52 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – January 26, 2026 (#707)

1 Share

I’m flying up to San Jose in a few moments, and spent some free time this weekend building a demo that showcases the AI-first product development lifecycle. It goes from research to planning to building and deploying to operations. Figured I can’t talk about it if I hadn’t tried it myself!

[article] How MCP Server Help AI Act. Quick piece, but it’s a good reminder of what MCP can do for you. I used a couple of servers this past weekend to finish a project faster.

[blog] High-Risk, High-Scale: Guaranteeing Ad Budget Precision at 1 Million Events/Second. Every architecture won’t look like yours, but we can still learn from use cases that don’t directly apply to us.

[blog] Just for Fun: Migrating a legacy Spring Boot application with Conductor in Gemini CLI. Daniel takes a dusty Spring Boot app and shows us the flow to genuinely modernize it with our agentic CLI. Great flow.

[blog] 2026 Predictions from the Battery Team. Everybody’s got an angle, but I like VC predictions given how close they are to what’s relevant in the moment.

[blog] I’m addicted to being useful. Me too, but I’ve also found it’s important to be useful where needed, not everywhere. Sometimes I just need to listen, or watch things play out.

[blog] Improving workflow orchestration with Apache Airflow 3.1 in Cloud Composer. Other software, non-AI software, continues to hum along. The latest version of Airflow for data processing is available on Google Cloud.

[article] Engineering as Humanity’s Highest Achievement. I’m a huge nerd for giant engineering projects. I love them. Keep building!

[article] Pushing the Agentic Frontier with Ephemeral Messages. Very cool original idea from our Google Antigravity team. This seems to make a big difference in how well the IDE follows instructions over long conversations.

[article] 16 open source projects transforming AI and machine learning. Here’s a timely list, with a couple things I hadn’t heard of yet.

[blog] Beyond Buy vs Build: A new choice in the world of enterprise software. Lak’s “fourth path” is interesting. Will vertical AI startups generate custom software, saving customers from vibe coding custom SaaS?

[blog] Stop Calling It “Vibe Coding” — It’s Supervised Generation. The “let AI generate everything without watching” crowd is excited right now, but Tim provides a sensible reminder.

[article] Kubernetes 1.35 features that change Day 2 operations. What’s new for platform folks running Kubernetes? Jani covers it well here.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
52 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

1 Share
Read the whole story
alvinashcraft
53 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Tips for getting coding agents to write good Python tests

1 Share

Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:


I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

Tags: testing, coding-agents, python, generative-ai, ai, llms, hacker-news, pytest

Read the whole story
alvinashcraft
53 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Things That Caught My Attention Last Week - January 26

1 Share

caught-my-i

Software Architecture

Systems Thinking Meets Simplicity-First: A Decision Framework for Software Architects by Chris Woodruff

You Can't Future-Proof Software Architecture by Derek Comartin

.NET

Making foreach on an IEnumerable allocation-free using reflection and dynamic methods by Andrew Lock

Enterprise Patterns for ASP.NET Core Minimal API: Data Transfer Object Pattern by Chris Woodruff

Simple OCR and NER Feature Extraction in C# with ONNX by Scott Galloway

C# Console menus with Actions by Karen Payne

Marten's Aggregation Projection Subsystem by Jeremy D. Miller

Filtering as domain logic by Mark Seemann

REST/APIs

Azure

AI Agents MCP Cosmos DB Transforming Development by Mark Brown

Azure Boards Additional Field Filters in Preview by Dan Hellem

Software Development

C++ has scope_exit for running code at scope exit. C# says "We have scope_exit at home." by Raymond Chen

Remaking the Linux "touch" command in PowerShell by Cassidy Williams

Signal ping : Code is easy; ownership is not by Mike Amundsen

Cleveland Tech is not Dead! by sadukie

A Practical Demo of Zero-Downtime Migrations Using Password Hashing by Milan Jovanović

AI

Context windows, Plan agent, and TDD: What I learned building a countdown app with GitHub Copilot by Chris Reddington

Read the whole story
alvinashcraft
53 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories