At endjin, we maintain Corvus.JsonSchema, and in the previous post we looked at JSON Pointer resolution.
JSON has a deliberately simple type system - strings, numbers, booleans, null, objects, and arrays. But the data those types carry is often richer than the JSON grammar suggests. A string might be a URI. A number might have 50 significant digits. A date-time might need proper time zone handling. V5 extends the core type system with first-class support for all of these.
UTF-8 URIs and IRIs
JSON Schema defines four URI-related format keywords: uri, uri-reference, iri, and iri-reference. V5 validates and parses all four with zero-allocation ref struct types that operate directly on the UTF-8 bytes in the document buffer.
Utf8Uri and Utf8Iri
Utf8Uri is a readonly ref struct that parses a URI from a ReadOnlySpan<byte> without allocating. It gives you access to every component - scheme, authority, user, host, port, path, query, and fragment - as ReadOnlySpan<byte> slices into the original buffer:
Utf8Uri uri = Utf8Uri.CreateUri(
"https://api.example.com:8080/v1/users?active=true#top"u8);
// Each component is a ReadOnlySpan<byte> slice - no allocation
ReadOnlySpan<byte> scheme = uri.Scheme; // "https"
ReadOnlySpan<byte> host = uri.Host; // "api.example.com"
ReadOnlySpan<byte> path = uri.Path; // "/v1/users"
ReadOnlySpan<byte> query = uri.Query; // "active=true"
ReadOnlySpan<byte> fragment = uri.Fragment; // "top"
int port = uri.PortValue; // 8080
For schema-generated types with "format": "uri", the code generator emits a TryGetValue method and an explicit conversion operator:
// Schema: { "type": "string", "format": "uri" }
// Generated type: MyEndpoint
if (endpoint.TryGetValue(out Utf8UriValue uriValue))
{
using (uriValue)
{
Utf8Uri uri = uriValue.Uri;
// Access components via uri.Scheme, uri.Host, uri.Path, etc.
}
}
// Or via explicit cast (throws FormatException if invalid)
using Utf8UriValue uriValue = (Utf8UriValue)endpoint;
Utf8UriValue is a regular (non-ref) struct that owns its backing memory. It implements IDisposable, so always use a using declaration so the backing buffer is returned to the pool.
URIs have two standard string representations. The canonical form percent-encodes reserved characters for safe transmission. The display form decodes those sequences for human readability:
Utf8Uri uri = Utf8Uri.CreateUri(
"https://example.com/caf%C3%A9?q=hello%20world"u8);
// Display form: decodes percent-encoded sequences for readability
// "https://example.com/cafĂŠ?q=hello world"
string display = uri.ToString();
// Canonical form: percent-encodes reserved characters for safe transmission
Span<byte> buffer = stackalloc byte[256];
if (uri.TryFormatCanonical(buffer, out int written))
{
// "https://example.com/caf%C3%A9?q=hello%20world"
ReadOnlySpan<byte> canonical = buffer.Slice(0, written);
}
// Display form as UTF-8 bytes
if (uri.TryFormatDisplay(buffer, out written))
{
ReadOnlySpan<byte> displayUtf8 = buffer.Slice(0, written);
}
Both methods write directly to a Span<byte> with no allocation. ToString() is the convenience overload that allocates a string for the display form.
Why not System.Uri?
System.Uri merges several distinct RFC concepts into a single type. It handles absolute URIs, relative references, and IRIs all through one class, which can be confusing. A method that accepts System.Uri gives no indication of whether it expects an absolute URI, a relative reference, or an IRI. V5 separates these into distinct types (Utf8Uri, Utf8UriReference, Utf8Iri, Utf8IriReference) so the semantic intent is clear at the API boundary.
Beyond the type-safety question, System.Uri allocates a managed string and normalises the URI, which can change its representation. The Utf8 variants validate and decompose the URI in place, with no allocation and no normalisation surprises. For JSON Schema format validation, this means checking whether a string is a valid uri-reference costs nothing beyond the parse itself.
All four types are derived from the .NET runtime's own System.Uri parser, rewritten to operate on UTF-8 spans rather than managed strings.
Arbitrary-precision numerics
JSON has no precision limit on numbers. The string 99999999999999999999999999999.123456789 is perfectly valid JSON. But double gives you about 15 significant digits, and decimal gives you 28. Anything beyond that is silently truncated.
In practice, you will almost never need arbitrary-precision types. The vast majority of JSON numbers fit comfortably in int, long, double, or decimal. The right approach is to use the format keyword in your schema to bound your numeric types appropriately. Use "format": "int32", "format": "double", "format": "decimal", and so on. The code generator will then select the matching .NET type, and you get compile-time safety for free.
BigNumber and BigInteger exist for the vanishingly small number of scenarios where unbounded precision is genuinely required. That includes cryptographic values, scientific datasets with extreme precision, or financial interop where the source system sends numbers beyond 28 significant digits.
How V5 handles numbers internally
V5 never converts a JSON number to a floating-point type during validation or comparison. Instead, it parses the raw UTF-8 bytes into normalised components:
| Component |
Type |
Example for 1.200e3 |
isNegative |
bool |
false |
integral |
ReadOnlySpan<byte> |
"1" |
fractional |
ReadOnlySpan<byte> |
"2" |
exponent |
int |
2 |
All comparison and validation operates on these components. A 500-digit JSON number is compared with perfect accuracy.
BigNumber and BigInteger
When you do need to materialise an arbitrary-precision value, there are two types. BigNumber handles decimal numbers (with a fractional part or exponent), while BigInteger handles integers of unlimited size:
using Corvus.Numerics;
// Arbitrary-precision decimal
BigNumber decimalValue = element.GetBigNumber();
BigNumber result = decimalValue * 2 + BigNumber.Parse("0.001");
// Arbitrary-precision integer
BigInteger intValue = element.GetBigInteger();
BigNumber stores a BigInteger significand and an int exponent (value = significand Ă 10^exponent). Both types implement INumber<T> on .NET 9+, so they work with generic math APIs.
BigNumber implements IFormattable, ISpanFormattable, and IUtf8SpanFormattable on .NET 9+, and the static formatting methods are available on all targets including netstandard2.0. It works with string interpolation, String.Format, and direct span formatting. All the standard numeric format specifiers are supported:
BigNumber value = BigNumber.Parse("12345678901234567890.123456789");
value.ToString("G", CultureInfo.InvariantCulture); // General: "12345678901234567890.123456789"
value.ToString("F2", CultureInfo.InvariantCulture); // Fixed-point: "12345678901234567890.12"
value.ToString("N0", CultureInfo.InvariantCulture); // Number with grouping: "12,345,678,901,234,567,890"
value.ToString("E3", CultureInfo.InvariantCulture); // Scientific: "1.235E+019"
value.ToString("C", CultureInfo.GetCultureInfo("en-GB")); // Currency: "ÂŁ12,345,678,901,234,567,890.12"
For zero-allocation formatting, write directly to a UTF-8 byte span:
Span<byte> buffer = stackalloc byte[128];
if (value.TryFormat(buffer, out int bytesWritten, "F2", CultureInfo.InvariantCulture))
{
ReadOnlySpan<byte> utf8Result = buffer.Slice(0, bytesWritten);
// Use utf8Result directly - no string allocation
}
Extended numeric types in code generation
The code generator reads the JSON Schema format keyword to select the appropriate .NET type:
| Format |
.NET type |
Notes |
"int32" |
int |
|
"int64" |
long |
|
"int128" |
Int128 |
.NET 9+ only; falls back to long on netstandard2.0 |
"uint128" |
UInt128 |
.NET 9+ only; falls back to ulong on netstandard2.0 |
"half" |
Half |
.NET 9+ only; falls back to double on netstandard2.0 |
"single" |
float |
|
"double" |
double |
|
"decimal" |
decimal |
|
(none, type: integer) |
long |
Default for unformatted integers |
(none, type: number) |
double |
Default for unformatted numbers |
For types that are only available on modern .NET, the code generator emits #if NET guards with appropriate fallbacks.
NodaTime integration
If you work with dates and times in .NET, NodaTime is the de-facto library for rich date and time handling. It helps you think about your data more clearly and express operations on that data more precisely. V5 includes built-in UTF-8 parsers for ISO 8601 formats that produce NodaTime types directly, without going through DateTime or DateTimeOffset as an intermediate step.
| JSON Schema format |
NodaTime type |
Example value |
"date" |
LocalDate |
"2026-05-31" |
"date-time" |
OffsetDateTime |
"2026-05-31T10:30:00+01:00" |
"time" |
OffsetTime |
"10:30:00+01:00" |
"duration" |
Period |
"P1Y2M3DT4H5M6S" |
When the code generator encounters these format keywords, the generated types automatically include NodaTime-typed accessors alongside the standard .NET ones:
// Generated from a schema with "format": "date-time"
OffsetDateTime when = calendarEvent.When.GetOffsetDateTime();
// The standard .NET accessor is also available
DateTimeOffset whenDto = calendarEvent.When.GetDateTimeOffset();
The parsers operate directly on the UTF-8 bytes in the document buffer. There is no intermediate string allocation. The NodaTimeExtensions namespace includes custom implementations of the Gregorian calendar calculations needed for validation, so there's no runtime dependency on the NodaTime NuGet package. The parsing is self-contained.
The NodaTime parsers handle the full complexity of ISO 8601 duration syntax, including fractional seconds, negative durations, and the distinction between date-based periods (P1Y2M) and time-based durations (PT1H30M). The Period type preserves the original components rather than normalising to a single unit, so P1M and P30D remain distinct.
Next up
In the [ref slug=introducing-corvus-text-json-v5-toon text=next post], we'll look at TOON - a compact text format for JSON-shaped data that reduces token count when working with LLMs.