Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151123 stories
·
33 followers

How to Perform Secure Hashing Using Python's hashlib Module

1 Share

Hashing is a fundamental technique in programming that converts data into a fixed-size string of characters. Unlike encryption, hashing is a one-way process: you can't reverse it to get the original data back.

This makes hashing perfect for storing passwords, verifying file integrity, and creating unique identifiers. In this tutorial, you'll learn how to use Python's built-in hashlib module to implement secure hashing in your applications.

By the end of this tutorial, you'll understand:

  • How to create basic hashes with different algorithms

  • Why simple hashing isn't enough for passwords

  • How to add salt to prevent rainbow table attacks

  • How to use key derivation functions for password storage

You can find the code on GitHub.

Prerequisites

To follow this tutorial, you should have:

  • Basic Python: Variables, data types, functions, and control structures

  • Understanding of strings and bytes: How to encode strings and work with byte data

No external libraries are required, as hashlib and os are both part of Python's standard library.

Table of Contents

  1. Basic Hashing with Python's hashlib

  2. Why Simple Hashing Isn't Enough for Passwords

  3. Adding Salt to Your Hashes

  4. Verifying Salted Passwords

  5. Using Key Derivation Functions

Basic Hashing with Python’s hashlib

Let's start with the fundamentals. The hashlib module provides access to several hashing algorithms like MD5, SHA-1, SHA-256, and more.

Here's how to create a simple SHA-256 hash:

import hashlib

# Create a simple hash
message = "Hello, World!"
hash_object = hashlib.sha256(message.encode())
hex_digest = hash_object.hexdigest()

print(f"Original: {message}")
print(f"SHA-256 Hash: {hex_digest}")

Output:

Original: Hello, World!
SHA-256 Hash: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

Here, we import the hashlib module, encode our string to bytes using .encode() as hashlib requires bytes, not strings.

Then we create a hash object using hashlib.sha256() and get the hexadecimal representation with .hexdigest().

The resulting hash is always 64 characters long regardless of input size. Meaning you have an output string that is 256 bits long. As each hexadecimal character requires 4 bits, the output has 256/4 = 64 hexadecimal characters. Even changing one character produces a completely different hash.

Let's verify that:

import hashlib

# Small change, big difference
message1 = "Hello, World!"
message2 = "Hello, World?"  # Only changed ! to ?

hash1 = hashlib.sha256(message1.encode()).hexdigest()
hash2 = hashlib.sha256(message2.encode()).hexdigest()

print(f"Message 1: {message1}")
print(f"Hash 1:    {hash1}")
print(f"\nMessage 2: {message2}")
print(f"Hash 2:    {hash2}")
print(f"\nAre they the same? {hash1 == hash2}")

Output:

Message 1: Hello, World!
Hash 1:    dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

Message 2: Hello, World?
Hash 2:    f16c3bb0532537acd5b2e418f2b1235b29181e35cffee7cc29d84de4a1d62e4d

Are they the same? False

This property is called the avalanche effect where a tiny change creates a completely different output.

Why Simple Hashing Isn't Enough for Passwords

You might think you can just hash passwords and store them in your database. But there's a problem: attackers use rainbow tables, which are precomputed databases of hashes for common passwords.

Here's what happens:

import hashlib

# Simple password hashing (DON'T USE THIS!)
password = "password123"
hashed = hashlib.sha256(password.encode()).hexdigest()

print(f"Password: {password}")
print(f"Hash: {hashed}")

Output:

Password: password123
Hash: ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

If two users have the same password, they'll have identical hashes. An attacker who cracks one hash knows the password for all users with that hash.

So how do we handle this? Let’s learn in the next section.

Adding Salt to Your Hashes

The solution is salting: adding random data to each password before hashing. This way, even identical passwords produce different hashes.

Here's how to implement salted hashing:

import hashlib
import os

def hash_password_with_salt(password):
    # Generate a random salt (16 bytes = 128 bits)
    salt = os.urandom(16)

    # Combine password and salt, then hash
    hash_object = hashlib.sha256(salt + password.encode())
    password_hash = hash_object.hexdigest()

    # Return both salt and hash (you need the salt to verify later)
    return salt.hex(), password_hash

# Hash the same password twice
password = "password123"

salt1, hash1 = hash_password_with_salt(password)
salt2, hash2 = hash_password_with_salt(password)

print(f"Password: {password}\n")
print(f"First attempt:")
print(f"  Salt: {salt1}")
print(f"  Hash: {hash1}\n")
print(f"Second attempt:")
print(f"  Salt: {salt2}")
print(f"  Hash: {hash2}\n")
print(f"Same password, different hashes? {hash1 != hash2}")

Output:

Password: password123

First attempt:
  Salt: fc24b2d2245ff65b80c5bced38744171
  Hash: 5ce634c05941d25871e7ee334b5c24c75f64c4f6d557db66909fcaa793d869f9

Second attempt:
  Salt: bc8a1f79b07e56b51285557211f88bb0
  Hash: 043599d90b2aa0556265869cead35724c7d9d9d37129d897c6b68bade9e737e6

Same password, different hashes? True

How this works:

  • os.urandom(16) generates 16 random bytes, which is our salt

  • We concatenate the salt and password bytes before hashing

  • We return both the salt (as hex) and the hash

  • You must store both the salt and hash in your database

When a user logs in, you retrieve their salt, hash the entered password with that salt, and compare the result to the stored hash.

Verifying Salted Passwords

Now let's create a function to verify passwords against salted hashes:

import hashlib
import os

def hash_password(password, salt=None):
    """Hash a password with a salt. Generate new salt if not provided."""
    if salt is None:
        salt = os.urandom(16)
    else:
        # Convert hex string back to bytes if needed
        if isinstance(salt, str):
            salt = bytes.fromhex(salt)

    password_hash = hashlib.sha256(salt + password.encode()).hexdigest()
    return salt.hex(), password_hash

def verify_password(password, stored_salt, stored_hash):
    """Verify a password against a stored salt and hash."""
    # Hash the provided password with the stored salt
    _, new_hash = hash_password(password, stored_salt)

    # Compare the hashes
    return new_hash == stored_hash

Here’s how you can use the above:

print("=== User Registration ===")
user_password = "mySecurePassword!"
salt, password_hash = hash_password(user_password)
print(f"Password: {user_password}")
print(f"Salt: {salt}")
print(f"Hash: {password_hash}")

# Simulate user login attempts
print("\n=== Login Attempts ===")
correct_attempt = "mySecurePassword!"
wrong_attempt = "wrongPassword"

print(f"Attempt 1: '{correct_attempt}'")
print(f"  Valid? {verify_password(correct_attempt, salt, password_hash)}")

print(f"\nAttempt 2: '{wrong_attempt}'")
print(f"  Valid? {verify_password(wrong_attempt, salt, password_hash)}")

Output:

=== User Registration ===
Password: mySecurePassword!
Salt: 381779b5262deea84183e4b9454b98b1
Hash: 9756e1f0bc4c1aa4a72f35b0be8d3c8f430d31613371cf7de3c615bc475de98f

=== Login Attempts ===
Attempt 1: 'mySecurePassword!'
  Valid? True

Attempt 2: 'wrongPassword'
  Valid? False

This implementation shows a complete registration and login flow.

Using Key Derivation Functions

While salted SHA-256 is better than plain hashing, modern applications should use key derivation functions (KDFs) specifically designed for password hashing. These include PBKDF2 (Password-Based Key Derivation Function 2), bcrypt, scrypt, and Argon2. You can check the links to learn more about these key derivation functions.

These algorithms are intentionally slow and require more computational resources, making brute-force attacks much harder. Let's implement PBKDF2, which is built into Python:

import hashlib
import os

def hash_password_pbkdf2(password, salt=None, iterations=600000):
    """Hash password using PBKDF2 with SHA-256."""
    if salt is None:
        salt = os.urandom(32)  # 32 bytes = 256 bits
    elif isinstance(salt, str):
        salt = bytes.fromhex(salt)

    # PBKDF2 with 600,000 iterations (OWASP recommendation for 2024)
    password_hash = hashlib.pbkdf2_hmac(
        'sha256',          # Hash algorithm
        password.encode(), # Password as bytes
        salt,              # Salt as bytes
        iterations,        # Number of iterations
        dklen=32           # Desired key length (32 bytes = 256 bits)
    )

    return salt.hex(), password_hash.hex(), iterations

def verify_password_pbkdf2(password, stored_salt, stored_hash, iterations):
    """Verify password against PBKDF2 hash."""
    _, new_hash, _ = hash_password_pbkdf2(password, stored_salt, iterations)
    return new_hash == stored_hash

# Hash a password
print("=== PBKDF2 Password Hashing ===")
password = "SuperSecure123!"
salt, hash_value, iterations = hash_password_pbkdf2(password)

print(f"Password: {password}")
print(f"Salt: {salt}")
print(f"Hash: {hash_value}")
print(f"Iterations: {iterations:,}")

This outputs:

=== PBKDF2 Password Hashing ===
Password: SuperSecure123!
Salt: b388aecd774f6a7ddd95405091548bb50102c99beb1a10326a4c54070da4a3a5
Hash: c681450f41d0cec9ea2aad1108efe2a430b9c3d9fc3af621071be10ac9b3615a
Iterations: 600,000

Now let’s verify the password and also compare the speeds of SHA-256 vs. PBKDF2:

print("\n=== Verification ===")
is_valid = verify_password_pbkdf2(password, salt, hash_value, iterations)
print(f"Password valid? {is_valid}")

# Show time comparison
import time

print("\n=== Speed Comparison ===")
test_password = "test123"

# Simple SHA-256
start = time.time()
for _ in range(100):
    hashlib.sha256(test_password.encode()).hexdigest()
sha256_time = time.time() - start

# PBKDF2
start = time.time()
for _ in range(100):
    hash_password_pbkdf2(test_password)
pbkdf2_time = time.time() - start

print(f"1000 SHA-256 hashes: {sha256_time:.3f} seconds")
print(f"1000 PBKDF2 hashes: {pbkdf2_time:.3f} seconds")
print(f"PBKDF2 is {pbkdf2_time/sha256_time:.1f}x slower")

Output:


=== Verification ===
Password valid? True

=== Speed Comparison ===
100 SHA-256 hashes: 0.000 seconds
100 PBKDF2 hashes: 53.631 seconds
PBKDF2 is 240068.1x slower

How PBKDF2 works:

  • Takes your password and salt

  • Applies the hash function (SHA-256) repeatedly – 600,000 times in this example

  • Each iteration makes the computation slower and harder to brute-force

  • You store the salt, hash, AND iteration count (so you can verify later)

The iteration count can be increased over time as computers get faster. Modern recommendations (2024) suggest 600,000 iterations for PBKDF2-SHA256.

Conclusion

You've learned how to implement secure password hashing in Python using the hashlib module. Here are the key takeaways:

  • Basic hashing with SHA-256 is useful for data integrity, not passwords

  • Salting prevents rainbow table attacks by making each hash unique

  • PBKDF2 adds computational cost through iterations, slowing down attackers

  • Always store the salt, hash, and iteration count together

  • Use key derivation functions (PBKDF2, bcrypt, Argon2) for passwords

The code examples in this tutorial provide a solid foundation for implementing authentication in your projects. But remember, security is an ongoing process. Stay updated on best practices and regularly review your security implementations.

Happy (secure) coding!



Read the whole story
alvinashcraft
56 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Analyse Large CSV Files with Local LLMs in C#

1 Share
How to Analyse Large CSV Files with Local LLMs in C#
Read the whole story
alvinashcraft
56 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Vibe Coding is a Technical Debt Factory

1 Share

I recently sat in a meeting where a Product Manager told me he had "vibe coded" a prototype over the weekend. He was ecstatic. He had spoken to a computer, told it his dreams, and the computer had spat out a working React application. He felt like a wizard. He felt like the future had finally arrived to liberate him from the tyranny of engineers like me.

Then I looked at the code.

It ran. I will give him that. It rendered pixels on the screen. But beneath the shiny interface lay a subterranean horror show of hard-coded secrets, duplicated state logic, and security vulnerabilities so gaping you could drive a truck through them. It was not software. It was a facade.

We are currently living through a mass delusion. The industry has latched onto a new term. Vibe Coding. The idea is simple. You don't need to know how to code. You just need to know the "vibe" of what you want. You supply the vision. The AI supplies the implementation.

It sounds magical. It sounds like the democratization of creation.

It is actually a catastrophe in waiting.

Is "Slop" the New Standard?

The narrative selling this dream is seductive. It tells us that the barrier to entry for software engineering has been artificially high. It argues that syntax is a gatekeeper preventing brilliant "idea guys" from building the next unicorn.

The proponents of Vibe Coding believe that natural language is the ultimate programming interface. They argue that we are moving from a deterministic era of explicit instruction to a probabilistic era of intent. In this worldview, the "how" is irrelevant. Only the "what" matters.

Platforms like Lovable and a thousand other "text-to-app" wrappers have sprung up to service this belief. They promise a world where you simply describe your application and it manifests. No debugging. No architectural diagrams. No understanding of memory management or API latency. Just pure creation.

The orthodoxy states that traditional coding skills are becoming obsolete. Why learn to invert a binary tree or understand the difference between TCP and UDP when an LLM can do it for you in seconds?

The answer is simple: Because the LLM doesn't understand it either. It just statistically predicts that it should be there.

If this utopia were real, we would see a golden age of software stability and innovation. We are seeing the opposite. The data is starting to pour in, and it paints a grim picture of the "Vibe Coding" reality. We are not building better software. We are building worse software faster than ever before.

GitClear analyzed over 150 million lines of code changed between 2020 and 2024. Their findings are damning. They found a massive increase in "code churn"—code that is written and then almost immediately deleted or rewritten. Even more worrying, they found an eight-fold increase in duplicated code blocks.

This is the hallmark of copy-paste programming. This is not efficiency. This is thrashing.

The Anatomy of the "Vibe"

Let's look at what happens inside the "Vibe" engine. This is speculation based on observation, but it aligns with the behaviour of every LLM I've tested.

When a non-technical user asks for a feature, they ask for the happy path. They ask for the visible result.

INPUT: "Make a dashboard for user metrics. I want to see daily active users."

The Vibe Coder (the human) thinks they have specified the software. They haven't. They have specified a UI.

Here is what the AI generates. This is the code that "runs" but ruins your life later.

The Vibe Implementation (What the PM ships)

// UserDashboard.js
// The AI generated this. It looks great in the demo.
import React, { useState, useEffect } from 'react';

function UserDashboard() {
  const [data, setData] = useState(null);

  // PROBLEM 1: This runs on every mount. No caching. No deduping.
  // If the user tabs away and back, we hammer the API.
  useEffect(() => {
    // PROBLEM 2: Direct fetch in component. Tightly coupled.
    // If we change the auth method, we have to rewrite every single component.
    fetch('https://api.myapp.com/metrics')
      .then(res => res.json())
      .then(data => setData(data))
      // PROBLEM 3: What happens on 401? 500? Network failure?
      // The AI doesn't care. The "vibe" is success, not failure handling.
      .catch(err => console.log(err)); 
  }, []);

  if (!data) return <div>Loading...</div>;

  return (
    <div>
       {/* PROBLEM 4: Direct property access without optional chaining or schema validation.
           If the API changes the shape of 'daily_users', the whole app crashes (White Screen of Death). */}
      <h1>Daily Users: {data.daily_users}</h1>

      {/* PROBLEM 5: Inline mapping logic that belongs in a transformer/selector layer. */}
      {data.history.map(item => (
        <div key={item.id}>{item.count}</div>
      ))}
    </div>
  );
}

javascript

The output is a perfect prototype. It is also a production nightmare.

It has introduced technical debt instantly. It has created a frontend dependency on an API that hasn't been designed properly. It has managed state locally in a way that will conflict with the rest of the application.

Now, let's look at what an engineer writes. Not because we love typing, but because we understand systems.

The Engineer Implementation (What actually works)

// useUserMetrics.ts
// The Engineer writes this. It handles reality.

import { useQuery } from '@tanstack/react-query'; // Using established libraries for caching
import { z } from 'zod'; // Runtime validation because APIs lie

// 1. Define the Schema. Contract first.
const MetricsSchema = z.object({
  daily_users: z.number(),
  history: z.array(z.object({
    id: z.string(),
    count: z.number()
  }))
});

export const useUserMetrics = () => {
  return useQuery({
    queryKey: ['metrics', 'daily'],
    queryFn: async () => {
      // 2. Abstraction layer for API calls
      const data = await apiClient.get('/metrics');

      // 3. Validation layer. If this fails, we know EXACTLY why.
      // We don't just crash the UI.
      const parsed = MetricsSchema.safeParse(data);
      if (!parsed.success) {
        throw new Error("API Contract Violation");
      }
      return parsed.data;
    },
    // 4. Configuration for stale-while-revalidate strategies
    staleTime: 1000 * 60 * 5, 
    retry: 2
  });
};

typescript

The "Vibe Coder" looks at the first example and sees success. It works. It's short.

The engineer looks at the first example and sees a rewrite. The second example is longer, yes. But it handles network flakiness, API drift, caching, and race conditions. The AI did not generate the second example because the prompt didn't ask for "runtime schema validation using Zod."

The prompt asked for a vibe.

The Context Gap

The Google 2025 DORA report reinforces this. They found that a 90% increase in AI adoption correlated with a 9% climb in bug rates and a massive 91% increase in code review time.

Think about that for a second. We are spending double the time reviewing code because the code we are generating is suspect.

The "Context Gap" is the killer here. The AI sees the file you are working on. It might see the file next to it. It does not see the legacy database schema from 2018 that you have to interface with. It does not understand the regulatory compliance requirement that forces you to encrypt that specific field. It does not "know" your architecture. It guesses.

When you write code by hand, you are forced to confront the details line by line. You have to think about the variable types. You have to think about the error handling. The friction of typing is a quality control mechanism. It forces your brain to engage with the logic.

When you generate code, you bypass that friction. You can generate a thousand lines of garbage in a second. This means the role of the engineer shifts from "writer" to "auditor". And auditing is harder than writing.

To audit code effectively, you need a deeper understanding of the system than the person (or machine) who wrote it. You need to spot the subtle bug that the AI introduced because it didn't understand the thread-safety model of your specific language version.

You cannot audit what you do not understand.

The Pseudo-Code of Failure

Let's look at the internal logic of a "Vibe Coding" session. This is what I suspect is happening when the abstraction leaks.

INPUT: "Update the user profile to allow multiple addresses." VIBE_LAYER: - Retrieves 'User Profile' React Component. - Retrieves 'Address' Form. - Ignores 'Database' context (Foreign Keys). - Ignores 'Billing' context (Billing address must be unique). GENERATION: - Change frontend to array of addresses. - Update API POST request payload. RESULT: - Frontend looks correct. allows adding 5 addresses. - Backend receives JSON array. - Database constraint (One-to-One) explodes. - 500 Internal Server Error.

The Vibe Coder hits the button. The UI updates. They smile. They deploy.

Then the support tickets start rolling in.

What This Actually Means

We are sitting on a time bomb of AI-generated technical debt. In two years, we will see a wave of high-profile failures in companies that went all-in on "Vibe Coding".

We will see security breaches caused by hallucinations. We are already seeing the user base for platforms like Lovable churn. People sign up. They build a demo. It looks great. They show their investors. Then they try to add a complex feature. They try to integrate a legacy payment gateway. They try to scale it to 10,000 users.

The system breaks. The code is a tangled mess of hallucinated logic and spaghetti dependencies. The user cannot fix it because they never understood it. They leave.

The market knows this. We are seeing a correction. The job market is not hiring "Vibe Coders". It is aggressively hiring senior engineers who can fix the mess.

This is the deeper truth: Vibe Coding is a lie because it relies on a fundamental misunderstanding of what software engineering is.

Coding is not the act of typing syntax. That is typing. Coding is the act of rigorous specification. It is the process of taking a vague, fuzzy human requirement and constraining it until it can be executed deterministically by a machine.

When you use a Vibe Coding platform, you are not skipping the hard part. You are ignoring it.

GenAI is probabilistic. If you feed it garbage, it smiles. It nods. It says "Certainly! Here is the code." It generates code that looks right. It adopts the "vibe" of correctness. This is where the term "slop" comes from. It is code that occupies space but provides no nutritional value to the system.

TL;DR For The Scrollers

  • Vibe Coding creates prototypes, not products. It handles the "happy path" and ignores the edge cases where software actually lives.
  • Churn is skyrocketing. GitClear data shows an 8x increase in duplicated code. We are generating slop, not solutions.
  • The friction is the point. The effort of writing code forces you to understand the logic. Removing the friction removes the understanding.
  • Auditing > Writing. The job of the future isn't prompt engineering; it's being a Senior Code Auditor who can spot the subtle lies the AI tells.
  • Context is King. AI lacks the historical and architectural context of your specific system. It guesses.

Edward Burton ships production AI systems and writes about the stuff that actually works. Skeptic of hype. Builder of things.

Production > Demos. Always.

More at tyingshoelaces.com

How many hours have you spent debugging code that an AI (or an "Idea Guy") claimed was "basically done"?

\

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

At AWS re:Invent, the news was agents, but the focus was developers

1 Share
Four days, 60,000 developers, and AI generated perfume. The re:Invent that was.
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Prompt Noise Is Killing Your AI Accuracy: How to Optimize Context for Grounded Output

1 Share

The most common reason an AI system “hallucinates” in production isn’t that the model is dumb. It’s that we’re drowning it. In the last year, many teams have quietly adopted a pattern that looks sophisticated on paper: throw everything into the prompt. Policies, API schemas, examples, edge cases, brand voice, product specs, meeting notes, customer […]

The article Prompt Noise Is Killing Your AI Accuracy: How to Optimize Context for Grounded Output was originally published on Build5Nines. To stay up-to-date, Subscribe to the Build5Nines Newsletter.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Adding a model provider for Paste with AI

1 Share

The post Adding a model provider for Paste with AI appeared first on Windows Developer Blog.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories