Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
136148 stories
·
30 followers

Using RealTime AI – Part 1: Getting Started with the Fundamentals of Low-Latency AI Magic

1 Share

Have you ever wished your AI could keep up with you—like, actually match your pace? You know, the kind of speed where you toss out a question and get a snappy reply before you’ve even blinked twice? Enter Realtime AI—a total game-changer that I’ll have to admit had me grinning like I had just unlocked a secret superpower the first time I got it running.

In this first installment of the RealTime AI series, I’ll break down what Realtime AI is for you, why it’s awesome, and provide you with a first look at the RealTime AI App—a fun demo app that brings this tech to life. Let’s jump in!

What’s RealTime AI All About?

Imagine traditional AI as that friend who takes awhile to text back—you send a message, twiddle your thumbs, and hope they reply before you’ve lost interest. RealTime AI, though? It’s like a live call—immediate, fluid, and right there with you. Powered by the OpenAI Realtime API and model, it’s designed to deliver low-latency, multimodal magic, processing voice and text inputs in milliseconds for conversations that feel as natural as chatting with a friend.

The secret? It’s built on models like gpt-4o-realtime optimized for real-time action. The realtime models handle everything from voice activation detection to audio streaming, and even throw in function calling support to let your AI take action—like pulling up customer info, formatting the response a specific way, or placing an order mid-chat. It’s a one-stop shop for building seamless, expressive experiences, without resorting to multiple calls to different AI models. What’s really great about it is its support for audio or text inputs from the user.

Why It’s a Big Deal

Have you ever tried cobbling together a voice assistant with separate speech recognition, text processing, and text-to-speech models? It can be challenging and enough to make you want to pull your hair out. The RealTime API flips that script. It streams audio inputs and outputs directly, handles interruptions like a seasoned conversationalist (think ChatGPT’s advanced voice mode), and does it all with a single API call.

An app with RealTime AI support can:

  • Teach a user a language and even check their pronounciation.
  • Allow a user to speak to a form and have it filled out automatically.
  • Provide a user information about employee benefits as they talk with the RealTime AI assistant.
  • Allow a customer to place an order using their voice, check an order’s status, and more.
  • Many more scenarios…

RealTime AI App in Action

The RealTime AI App is a web-based project that I wrote using Angular on the front-end and Node.js on the back-end. It really shows off what this tech can do and provides two main features.

  • Language Coach: Speak a phrase like “Hola, ¿cómo te llamas?” and it’ll chime in with, “Nice, but emphasize the ‘c’ more in ‘cómo’!” It’s your patient and kind language and pronunciation tutor.
  • Medical Form Assistant: Say “Patient John Smith, 42 years old, history of pneumonia” and it returns a JSON object like { “name”: “John Smith”, “age”: “42”, “notes”: “pneumonia” } and fills in a form for you. Medical assistants, nurses, and doctors can speak directly to a form (no keyboard required) as they’re busily hurrying around a hospital environment and have the form automatically filled in with patient details.

Since there are several parts to the RealTime AI App, I’ll break it down into individual pieces for you through a series of posts that follow. In the meantime, here’s a high-level overview of the key parts of the app.

  • Client: This is you—the user interacting with the app via your browser. It sends audio or text inputs (like saying “Hello” or typing a question) to kick things off. It’s written using Angular.
  • RealTime Session: The Node.js code where the main action takes place – it manages the flow. It uses a client WebSocket to receive your inputs and send back responses, while a RealTime AI WebSocket connects to the OpenAI API. The logic block processes messages, ensuring everything runs smoothly between the client and the AI.
  • OpenAI RealTime API: This is the brains of the operation. It receives audio/text from the Realtime Session, processes it with the gpt-4o-realtime model, and sends back audio/text responses. The app supports calling OpenAI or Azure OpenAI.
RealTime AI App diagram showing the client, realtime session, and OpenAI realtime interaction.

What’s Next?

This is just the start! In Part 2, I’ll dive into the server-side details—Node.js, WebSockets, and some code to tie it all together. You’ll also see how prompts play a key role in determining the type of responses returned to the user. Stay tuned!

Found this helpful? Share it with your crew and follow me for the next installment:

Read the whole story
alvinashcraft
59 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

#496: Scaf: Complete blueprint for new Python Kubernetes projects

1 Share
Today we explore the wild world of Python deployment with my friend, Calvin Hendricks-Parker from Six Feet Up. We’ll tackle some of the biggest challenges in taking a Python app from “it works on my machine” to production, covering inconsistent environments, conflicting dependencies, and sneaky security pitfalls. Along the way, Calvin shares how containerization with Docker and Kubernetes can both simplify and complicate deployments, especially for smaller teams. Finally, we’ll introduce Scaf, a powerful project blueprint designed to give developers a rock-solid start on Python web projects of all sizes.

Get notified when the Talk Python in Production book goes live and read the first third online right now.

Episode sponsors

Posit
Python in Production
Talk Python Courses

Calvin Hendryx-Parker: github.com
Scaf on GitHub: github.com
Scaf on GitHub (duplicate): github.com

"Deploy the Dream" song: deploy-the-dream-talk-python.mp3

CloudDevEngineering YouTube Channel: youtube.com
TechWorld with Nana YouTube Channel: youtube.com
Tilt (Kubernetes Dev Tool): tilt.dev
Talos (Minimal OS for Kubernetes): talos.dev
Traefik Reverse Proxy: traefik.io
Sealed Secrets on GitHub: github.com
Argo CD Documentation: readthedocs.io
MailHog on GitHub: github.com
Next.js: nextjs.org
Cloud Custodian: cloudcustodian.io
Valky (Redis Replacement): valkey.io
“The ‘Works on My Machine’ Certification Program” (Coding Horror): blog.codinghorror.com
NVIDIA’s First Desktop AI PC (Ars Technica): arstechnica.com
Kind (Kubernetes in Docker): kind.sigs.k8s.io

Updated Effective PyCharm Course: training.talkpython.fm
Talk Python in Production book: talkpython.fm/books/python-in-production
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy




Download audio: https://talkpython.fm/episodes/download/496/scaf-complete-blueprint-for-new-python-kubernetes-projects.mp3
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

How to become a self-taught developer while supporting a family [Podcast #164]

1 Share

On this week's episode of the podcast, I interview Jesse Hall. He's software engineer and a developer advocate at MongoDB. He taught himself to code while raising kids and working on the Best Buy Geek Squad fixing computers.

Jesse has created tons of tutorials over the years on YouTube and on freeCodeCamp. We talk about his coding journey, how the field has changed over the few years, and how hype has distorted peoples' perception of getting into code.

We talk about:

  • Growing up in a one stop light town

  • Teaching himself to code for free using freeCodeCamp

  • How he created YouTube tutorials to inspire his kids, then got quite good at it

  • How Jesse's early interest in Web3 lead him to needing to "dig himself out of the grave" of being "the NFT tutorial guy"

Support also comes from the 11,384 kind folks who support freeCodeCamp through a monthly donation. Join these kind folks and help our mission by going to https://www.freecodecamp.org/donate

You can watch the interview on YouTube:

Or you can listen to the podcast in Apple Podcasts, Spotify, or your favorite podcast app. Be sure to follow the freeCodeCamp Podcast there so you'll get new episodes each Friday.

Links we talk about during our conversation:



Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

The Trump administration is coming for student protesters

1 Share

The Trump administration is embarking on a massive university speech crackdown, starting with Columbia University, where it’s demanding external control of entire departments and punishment for student activists. Its first test case, Mahmoud Khalil, a graduate student with a green card, offers a hint of what’s to come: a state of intentional chaos that undermines free speech and due process rights. Thus far, Columbia appears to be complying with the administration’s demands, even as its students gear up to fight back.

Department of Homeland Security (DHS) agents raided Columbia University’s campus on Thursday night, looking for students in two residential buildings, according to a university-wide email sent by Columbia’s interim president Katrina Armstrong. At a press conference on Friday, deputy attorney general Todd Blanche said the Justice Department is investigating whether student protesters at Columbia violated federal terrorism laws and that it would prosecute “any person engaging in material support of terrorism.” Hours before the raids, Columbia received a joint letter from three government agencies demanding that it punish student protesters; empower “in …

Read the full story at The Verge.

Read the whole story
alvinashcraft
9 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Everything You Say To Your Echo Will Be Sent To Amazon Starting On March 28

1 Share
An anonymous reader quotes a report from Ars Technica: In an email sent to customers today, Amazon said that Echo users will no longer be able to set their devices to process Alexa requests locally and, therefore, avoid sending voice recordings to Amazon's cloud. Amazon apparently sent the email to users with "Do Not Send Voice Recordings" enabled on their Echo. Starting on March 28, recordings of everything spoken to the Alexa living in Echo speakers and smart displays will automatically be sent to Amazon and processed in the cloud. Attempting to rationalize the change, Amazon's email said: "As we continue to expand Alexa's capabilities with generative AI features that rely on the processing power of Amazon's secure cloud, we have decided to no longer support this feature." One of the most marketed features of Alexa+ is its more advanced ability to recognize who is speaking to it, a feature known as Alexa Voice ID. To accommodate this feature, Amazon is eliminating a privacy-focused capability for all Echo users, even those who aren't interested in the subscription-based version of Alexa or want to use Alexa+ but not its ability to recognize different voices. [...] Amazon said in its email today that by default, it will delete recordings of users' Alexa requests after processing. However, anyone with their Echo device set to "Don't save recordings" will see their already-purchased devices' Voice ID feature bricked. Voice ID enables Alexa to do things like share user-specified calendar events, reminders, music, and more. Previously, Amazon has said that "if you choose not to save any voice recordings, Voice ID may not work." As of March 28, broken Voice ID is a guarantee for people who don't let Amazon store their voice recordings. Amazon's email continues: "Alexa voice requests are always encrypted in transit to Amazon's secure cloud, which was designed with layers of security protections to keep customer information safe. Customers can continue to choose from a robust set of controls by visiting the Alexa Privacy dashboard online or navigating to More - Alexa Privacy in the Alexa app." Further reading: Google's Gemini AI Can Now See Your Search History

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
9 hours ago
reply
Pennsylvania, USA
Share this story
Delete

OpenAI and Google ask the government to let them train AI on content they don’t own

1 Share

OpenAI and Google are pushing the US government to allow their AI models to train on copyrighted material. Both companies outlined their stances in proposals published this week, with OpenAI arguing that applying fair use protections to AI “is a matter of national security.”

The proposals come in response to a request from the White House, which asked governments, industry groups, private sector organizations, and others for input on President Donald Trump’s “AI Action Plan.” The initiative is supposed to “enhance America’s position as an AI powerhouse,” while preventing “burdensome requirements” from impacting innovation.

In its comment, Open claims that allowing AI companies to access copyrighted content would help the US “avoid forfeiting” its lead in AI to China, while calling out the rise of DeepSeek

“There’s little doubt that the PRC’s [People’s Republic of China] AI developers will enjoy unfettered access to data — including copyrighted data — that will improve their models,” OpenAI writes. “If the PRC’s developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over.”

Google, unsurprisingly, agrees. The company’s response similarly states that copyright, privacy, and patents policies “can impede appropriate access to data necessary for training leading models.” It adds that fair use policies, along with text and data mining exceptions, have been “critical” to training AI on publicly available data.

“These exceptions allow for the use of copyrighted, publicly available material for AI training without significantly impacting rightsholders and avoid often highly unpredictable, imbalanced, and lengthy negotiations with data holders during model development or scientific experimentation,” Google says.

Anthropic, the AI company behind the AI chatbot Claude, also submitted a proposal – but it doesn’t mention anything about copyrights. Instead, it asks the US government to develop a system to assess an AI model’s national security risks and to strengthen export controls on AI chips. Like Google and OpenAI, Anthropic also suggests that the US bolster its energy infrastructure to support the growth of AI.

Many AI companies have been accused of ripping copyrighted content to train their AI models. OpenAI currently faces several lawsuits from news outlets, including The New York Times, and has even been sued by well-known names like Sarah Silverman and George R.R. Martin. Apple, Anthropic, and Nvidia have also been accused of scraping YouTube subtitles to train AI, which YouTube has said violates its terms.

Read the whole story
alvinashcraft
9 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories