Generate checks for all forms of UB in Debug and ReleaseSafe modes #2301

andrewrk · 2019-04-18T05:07:45Z

It's always going to be possible to do unsafe things in Zig, because we have inline assembly, @intToPtr, and ability to call extern functions with the wrong ABI.

But what if we could eliminate everything else? What if, after enumerating all kinds of undefined behavior (See #1966), we could make all of them safety-checked, with only some obvious exceptions such as inline assembly?

even @intToPtr could look at the type it was casting to and make sure it is correct, somehow?
runtime safety for pointer lifetimes? can we solve use-after-free? This is the big one.
try to solve everything on the list turned up by enumerate all kinds of undefined behavior, even that which cannot be safety-checked #1966.

This might turn out to be impossible or impractical, but it's worth investigating.

If we could reduce the unchecked undefined behavior surface to a minimum level, there could be auditing/linting tools to point out where unsafety lies in zig software. It would be possible to say something like, "Debug and ReleaseSafe builds of Zig code are safe (in that they crash rather than have undefined behavior), except for inline assembly and extern function ABI mismatch", or some short list of exceptions.

If the cost of these protections is high, that's what we have @setRuntimeSafety for (see #978).

It would be reasonable for these protections to depend on OS-specific behavior, and to be unavailable on some targets, such as freestanding.

The text was updated successfully, but these errors were encountered:

andrewrk · 2019-04-30T05:04:44Z

Interestingly, this may be related to #130. One current area of unsafety is when you use struct embedding and @fieldParentPtr with an enum telling which kind of type something is with a @ptrCast. If you mismatch ids and types, boom. Potentially the interface/vtable/oop feature, if any, that ends up coming out of #130 could solve this safety issue. Or maybe there could be a more general solution.

ghost · 2019-08-27T16:34:01Z

any improvement?

nmeum · 2020-04-22T10:05:21Z

IMHO the important thing to get right in regards to safety is memory safety. Existing publications on this topic often make a distinction between spatial and temporal memory safety, e.g. [1]:

Memory safety has two aspects. Temporal safety is ensured when memory is never used after it is freed. Spatial safety is ensured when any pointer dereference is always within the memory allocated to that pointer.

As it stands currently, I think Zig offers neither spatial nor temporal memory safety. Haven't really used the language much yet but it seems to me that [*]T pointer types allow spatial errors as they do not have bounds information associated with them. Issues regarding temporal memory safety have already been mentioned in this issue (use-after-free). I also believe that temporal memory safety is harder to get right. The obvious solution is garbage collection though that would make Zig unusable on embedded devices. Alternatives include reference counting or Rust-like borrow checking. There is a nice write-up by the person behind the Lobster programming language which briefly discusses the different techniques and describes how the issue is addressed in Lobster.

ngrilly · 2021-03-16T12:07:19Z

Curious about what could be the strategy to make Zig safe against use-after-free of a local function stack variable? Do you think it's possible through static analysis or by crashing at runtime in Debug and ReleaseSafe modes?

matu3ba · 2021-03-23T00:35:07Z

Latest relevant blog post summary by scattered thoughts:

issue	c	zig (release-safe)	rust (release)
out-of-bounds heap read/write	none	runtime	runtime
null pointer dereference	none	runtime⁰	runtime⁰
type confusion	none	runtime, partial¹	runtime²
integer overflow	none	runtime	runtime³
use after free	none	none⁴	compile time
double free	none	none⁴	compile time
invalid stack read/write	none:	none 5	compile time
uninitialized memory	none	none 6	compile time
data race	none	none 7	compile time

optional types
tagged unions, doesn't protect against holding a pointer to value while changing tag
tagged unions
not by default, but available via compiler setting or by linting against unchecked arithmetic
mitigated by "the basic building block of heap allocation, page_allocator, uses a global, atomic, monotonically increasing address hint to mmap, ensuring that virtual address pages are not reused until the entire virtual address space has been exhausted. In practice, this is a very long time for 64-bit applications. The standard library GeneralPurposeAllocator in safe build modes follows a similar strategy for large allocations and for small allocations, does not re-use slots. Similarly, an ArenaAllocator backed by page_allocator does not re-use any virtual addresses." https://lobste.rs/s/v5y4jb/how_safe_is_zig#c_vddk9j

My thoughts
Borrow checking requires fixing the drop order aka RAII, because free cant be borrow-checked (unless one maps and proves the complete program logically): https://users.rust-lang.org/t/how-drop-clear-memory/26388/6
If the drop order is not fixed wrt to allocation order, the borrow checker cant rely on which pointers to memory are valid and which are not. It would need to check all possible combinations how memory can be allocated over the control-flow to check all paths that free (sic).

I realized this after I tried to understand free (called drop in Rust).
Here is how Polonius works: https://nikomatsakis.github.io/rust-belt-rust-2019/#98

Since we would need to require 1.lifetime annotations on functions and types as 2.parseable comments, 3.no union(type compression): Is it feasible to separate a program into memory checked and unchecked parts?

5: Stackoverflows can be fixed via static analysis of recursion and worst case stack over-approximation, which would even give a stronger guarantee than Rust (no stackoverflows). This information can be used for optimization and thus should remain closely tight to the compiler. However, this does not include pointer access to stack.

6: UUM checker, but I believe we can restrict or optimize the analysis due to usingnamespace making usage of symbols explicit. This only leaves potential linker hacks to initialize variables as potential pitifal #9512 and external symbols in linked libraries/C code. Probably the same strategy as 7 (emitting symbols for external analysis) should be done.

7: To decide this (in a simplified model), one either needs graph analysis or deduce all usage combinations.
Its unclear what comptime memory semantics imply on readable comptime values or if it will be possible to do "exit-comptime analysis with comptime" or how much complexity and slowdown that would mean.
Another more simple possibility would be to tag the usage of threading synchronization and collect that information during comptime to emit it separately for analysis tools.
This is probably also the correct strategy for static analysis to keep Zig small, simple, fast to compile and not complicate up the compiler with heavy and slow static analysis.
However, external closed-source static analysis opens up monetary incentive to "introduce bugs as optimization features" and alike things, as corporate users "can just use tool xyz" and dont have the problem.

APPENDUM
Rust has in-baked shape analysis for memory safety and typestate inference.

matu3ba · 2021-03-23T10:32:12Z

Sketch of borrow checking (BC). I dont know yet, if and where I did flawed assumptions. Will probably sleep a few nights overthinking it. Based on Nikos presentation on Polonius.

(I think) The decision for or against datafrog is motivated by

finding and accessing time of each scope block (likely one or two hashsets) that defines (1.variables, 2.lifetime rules, 3. tree that describes relation to other variables).
resolving loop relations on creating the relation tree of types => likely space vs execution time tradeoff.
Keeping code simple to have a sound implementation.

On applying BC (we simplify allocation here):

Global scope variables etc must be obtained and the according tree defining their relations filled. Loops in the relation tree must be detected/handled (how costly with inference rules?) to prevent getting too slow/error out.
For each function:
The Control Flow Automata(CFA) must be constructed, which requires forward traversing all functions (on demand).
The CFA of each function (edges corresponding to statements) must be traversed 2 times:

backwards to annotate lifetimes to the variables
forwards to conduct BC analysis

The BC analysis then must abstract over loops to generate rules and figure out, if any function evaluation resolves to satisfying or unsatisfying lifetime constraints (1.order of how memory is allocated or deallocated, 2.pointer validity with lifetime rules).
Rust solves 1. by using RAII and zig could enforce defer deinit and defer free to the respecting init call. We can also easily generate those rules via lifetime annotations on the functions.

The difficult part is 2: Pointer allow arbitrary indirection complexity and looking up the address the pointer points to crushes cache-locality relatively fast, because we must track the possible indirections over the whole lifetime of the program with annotating and resolving the lifetime rules.
Hence it becomes very fast easier to use a database/key value store for it, which boils down to the Rust approach.

BC Fundamental rule
An memory slice(MS) is defined by base address + bytelength and maps to continuous memory.
We must have in every scope either

exactly one non-const pointer to a MS (mutable borrow of variable) (that is used later)
arbitrary many const pointer to a MS (shared borrow of variable) (that is used later)

BC principles

Loans define the relation of pointers p to variable x (and vice versa)

    var x: u16 = 100; //MS assume base address 0x400, len 2 (byte)
    // ignore for now that p1 and p2 (and x) are not live (dead variable optimisation)

    var p1: *const u16 = &x; //Rust: let p1 = &x //Loan1(L1): p1 borrows x shared, so x can not be written unless p1 goes of out of scope

    var p2: *u16 = &x; //Rust: let p1 = mut &x //Loan2(L2): p2 borrows x mutable, so only p2 can write x. x can be used as usual.

Violating loans

    var x: u16 = 22;
    var p1: *const u16 = &x; //L1: p1 borrows x shared
    x += 1; //error: x is modified, but was borrowed shared => lookup (what subparts) are shared in which mode
    print("y: {}", .{p1.*}); //make sure p1 lives  (see below)

Live variables

    var x: u8 = 1;
    //x is dead here, because it is reassigned before usage
    x = 2;
    //x is live here, because there is no reassignment before usage
    //common static analysis method: Live Variable Analysis (traverse CFA backwarsd)
    print("x: {}", .{x});

Live loan

    const Foo = struct { x: u8 = 1, bar: u8 = 2 };
    //TODO: make it properly
    var foo:Foo = Foo{};
    var x:  = &foo.bar; //L1: x borrows foo mutable
    var y = &foo; //L1: error, can not borrow foo[0,bar.end] => foo[bar.start,bar.end] mutable borrowed

    //y gets used => loans L1 and L2 live
    print("z: {}", .{y.*});

APPENDUM
What I think is worth investigating is in Polonius/Rust borrow checker a way to tell the solver what values to drop during analysis (once they become not needed anymore).

Hadron67 · 2021-04-03T13:44:15Z

Live loan

    const Foo = struct { x: u8 = 1, bar: u8 = 2 };
    //TODO: make it properly
    var foo:Foo = Foo{};
    var x:  = &foo.bar; //L1: x borrows foo mutable
    var y = &foo; //L1: error, can not borrow foo[0,bar.end] => foo[bar.start,bar.end] mutable borrowed

    //y gets used => loans L1 and L2 live
    print("z: {}", .{y.*});

Since Rust borrow checker does not work for self referential types, I doubt if BC of this type is feasible? For example, is the following code disallowed?

const Foo = struct { next: ?*Foo, };
var foo: Foo = .{ .next = null };
foo.next = &foo;// error, foo is already borrowed mutable?

matu3ba · 2021-04-03T15:15:58Z

@Hadron67

This is what Rust has Pin for.
So another referencing Pin types should have the same lifetimes + must not to be moved around in memory after getting placed (by the programmer): https://rust-lang.github.io/async-book/04_pinning/01_chapter.html

So in theory this should be possible, but in practice one would need to compute a compile-time derivable memory layout over the recursive structures. This works fine for lists, smaller/fixed-layout trees and simple stuff, but likely does not work anymore for more complex structures.

(The idea is that you can only set null-pointer to non-null and non-null pointer must not be modified until you clean up recursively the structure.)

By the way, your simplified example is currently broken on master:

const std = @import("std");
const print = std.debug.print;
pub fn main() void {
    const Foo = struct { t1: u8, next: ?*Foo }; //without type t1 zls or zig.vim immediately complains
    var foo: Foo = .{ .t1 = 1, .next = null };
    foo.next = &foo;
}

matu3ba · 2021-04-06T18:58:08Z

Another interesting approach is Wuffs, which can be significantly fast.
"However, unlike hand-written C, Wuffs the Language is safe with respect to buffer overflows, integer arithmetic overflows and null pointer dereferences."

They can only do this, because they require that heap memory owned by pointers does not get fragmented. So programmers must manage pointer offsets to each heap structure themself. Hence one never gets potential pointer soup (pointers pointing to subparts and pointing to other things), which Rust resolves with (the costly) pointer lifetime checking.

So memory handling is relative static and simple, because no subpointers or substructures to same memory region can be created.

Also loop invariants etc must all be annotated, so one does basically proof all invariants manually and gets only the compiler confirmation. So overall this should be fine for anything simpler in terms of memory layout.

Hadron67 · 2021-04-07T16:43:42Z

@matu3ba
Actually the big problem with Rust-style borrow checker and lifetime analyse is they can only handle affine substructural type system, such as simple structs and trees. Wuffs also falls into this category of types that's why it's simple, fast, and easy to analyse. However, when come to non-linear types the analysis become much harder and is very likely impossible to be carried out in general. Example of such types including self referential types, doubly linked lists, graphs, etc. Unfortunately many OS level data structures belong to this type. Dealing with such types in Rust would require unsafe, such as using Pin for self referential types, and Rc<RefCell<T>> for shared mutable objects (Rc and RefCell implementation all contains unsafe block).

So instead, in zig, except some really simple cases such as returning pointer to a stack-allocated variable, I think we should keep the language simple (I don't want to fight with BC even in zig hahaha, and we don't have "unsafe zig") and just do the safety checks at runtime (Debug and ReleaseSafe modes only of course), raise error immediately when a memory access violation happens. This makes the problem much easier to spot than in c or c++.

For more information about the problem with Rust and ownership model, I recommend this article.

(maybe we should consider moving to discussions?)

matu3ba · 2021-04-08T07:30:43Z

@Hadron67 thanks for the great article and summary (though Pin does require unsafe due to move semantics in c++/Rust, which c and zig to my knowledge does not have).

https://zigforum.org/ would be a candidate.

oconnor663 · 2021-05-03T18:15:19Z

How might future safety checks handle lifetime issues involving allocating containers like ArrayList? Here are a couple of proof-of-concept examples, which run without errors in Debug mode currently, but where Valgrind detects invalid reads and writes. (Of course the fact that these examples trigger UB is unsurpsing and well documented, and the question is just how to catch them in Debug mode. Also apologies for unidiomatic code here. I'm totally new to Zig.)

Example 1: reallocation invalidating a pointer

const std = @import("std");

pub fn main() anyerror!void {
    // Make an ArrayList with one element.
    var list = std.ArrayList(i32).init(std.heap.c_allocator);
    defer list.deinit();
    try list.append(0);

    // Take a pointer to that element.
    const ptr = &list.items[0];

    // Add some more elements, to force a reallocation.
    var i: usize = 0;
    while (i < 100) {
        try list.append(0);
        i += 1;
    }

    // BUG: Reallocation has invalidated our pointer, and dereferencing it is
    // now a use-after-free.
    std.debug.print("ptr.* = {}\n", .{ptr.*});
}

Example 2: copying leading to a double free

const std = @import("std");

pub fn main() anyerror!void {
    // Make an ArrayList with one element.
    var list1 = std.ArrayList(i32).init(std.heap.c_allocator);
    defer list1.deinit();
    try list1.append(0);

    {
        // Make another ArrayList with a shorter lifetime.
        var list2 = std.ArrayList(i32).init(std.heap.c_allocator);
        defer list2.deinit();

        // Overwrite list2 with list1. This assignment is a shallow copy.
        list2 = list1;
    }

    // BUG: list2.deinit() has freed list1's heap storage, so reading list1 is
    // now a use-after-free, and list1.deinit() will be a double free.
    std.debug.print("list1 = [{}]\n", .{list1.items[0]});
}

pfgithub · 2021-05-04T04:24:19Z

@oconnor0 The GeneralPurposeAllocator catches these issues in most cases by avoiding reusing memory addresses for as long as possible.

const std = @import("std");

pub fn main() anyerror!void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.testing.expect(!gpa.deinit());
    const alloc = &gpa.allocator;
    // Make an ArrayList with one element.
    var list = std.ArrayList(i32).init(alloc);
    defer list.deinit();
    try list.append(0);

    // Take a pointer to that element.
    const ptr = &list.items[0];

    // Add some more elements, to force a reallocation.
    var i: usize = 0;
    while (i < 100) {
        try list.append(0);
        i += 1;
    }

    // BUG: Reallocation has invalidated our pointer, and dereferencing it is
    // now a use-after-free.
    std.debug.print("ptr.* = {}\n", .{ptr.*});
}

Segmentation fault at address 0x7ff11876b000
/home/pfg/Dev/Node/temp/generated/c2eadb992fd9e98a212b0aa4ca835207/tmp/test.zig:24:42: 0x229e09 in main (test)
    std.debug.print("ptr.* = {}\n", .{ptr.*});
                                         ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:420:37: 0x22256a in std.start.callMain (test)
            const result = root.main() catch |err| {
                                    ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:362:12: 0x20653e in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:316:17: 0x2057c9 in std.start.posixCallMainAndExit (test)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:238:5: 0x2056f2 in std.start._start (test)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
fish: Job 1, 'zig run test.zig' terminated by signal SIGABRT (Abort)

,

const std = @import("std");

pub fn main() anyerror!void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.testing.expect(!gpa.deinit());
    const alloc = &gpa.allocator;

    // Make an ArrayList with one element.
    var list1 = std.ArrayList(i32).init(alloc);
    defer list1.deinit();
    try list1.append(0);

    {
        // Make another ArrayList with a shorter lifetime.
        var list2 = std.ArrayList(i32).init(alloc);
        defer list2.deinit();

        // Overwrite list2 with list1. This assignment is a shallow copy.
        list2 = list1;
    }

    // BUG: list2.deinit() has freed list1's heap storage, so reading list1 is
    // now a use-after-free, and list1.deinit() will be a double free.
    std.debug.print("list1 = [{}]\n", .{list1.items[0]});
}

Segmentation fault at address 0x7f28a6d18000
/home/pfg/Dev/Node/temp/generated/c2eadb992fd9e98a212b0aa4ca835207/tmp/test.zig:24:52: 0x229dcd in main (test)
    std.debug.print("list1 = [{}]\n", .{list1.items[0]});
                                                   ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:420:37: 0x22256a in std.start.callMain (test)
            const result = root.main() catch |err| {
                                    ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:362:12: 0x20653e in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:316:17: 0x2057c9 in std.start.posixCallMainAndExit (test)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/pfg/Apps/zig/zig-linux-x86_64-0.8.0-dev.2082+e9e91b4ed/lib/std/start.zig:238:5: 0x2056f2 in std.start._start (test)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
fish: Job 1, 'zig run test.zig' terminated by signal SIGABRT (Abort)

The c allocator will likely never be able to be fully safe, but there might be things that can be done.

oconnor663 · 2021-05-04T20:33:22Z

@pfgithub a couple questions about the GeneralPurposeAllocator if you have time: If some objects in a page are still alive, can it catch use-after-free bugs related to freed objects in the same page? Or is it only able to do that once all the objects in a page have been freed? Also, could it ever be possible to implicitly switch an application to the GeneralPurposeAllocator (or something that behaves like it) in Debug/ReleaseSafe mode, or will that sort of thing always require explicit debugging code in the application itself?

Edit: Trying your examples just now, it looks like if I add an extra ArrayList that keeps an allocation of the same size alive until the end of main, it affects the debugging output. The first example doesn't catch the use-after-free, and exits without an error. The second example doesn't catch the use-after-free, but does panic on the double-free.

pfgithub · 2021-05-04T20:43:36Z

@oconnor0

The best places for questions like these is in one of the community spaces. I know that people in the discord are very responsive, and the irc is good as well especially for more technical questions about zig.

Also, could it ever be possible to implicitly switch an application to the GeneralPurposeAllocator (or something that behaves like it) in Debug/ReleaseSafe mode, or will that sort of thing always require explicit debugging code in the application itself?

The general purpose allocator is configurable, and the default configuration .{} automatically changes stuff in release-fast to run faster: https://github.com/ziglang/zig/blob/master/lib/std/heap/general_purpose_allocator.zig

Switching to a c allocator in release-fast instead of using zig's release-fast allocator does require explicit application code.

matu3ba · 2021-05-11T07:39:38Z

"MemorySanitizer: fast detector of uninitialized memory use in C++" by Evgeniy Stepanov and Konstantin Serebryany https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf

With this approach for checking UUM => 2.5x compiletime cost, 2x memory. However, this approach still includes false negatives (there can be UUM even though the check says there is none).

Symbitic · 2021-12-30T11:54:49Z

One alternative to the "ownership" model of Rust and C++ smart pointers is something much simpler: describe how you intend to use a pointer.
Microsoft did a lot of research on this for Checked C. _Ptr<> indicates a pointer that is dereferenced only. No arithmetic. These actually make up a majority of pointers. When a function will perform arithmetic, then bounds information must be given to indicate what addresses will be used (e.g. int weird_add(_Array_ptr<int> nums : bound(2, 4)) will access nums[2], nums[3, and nums[4]). If the compiler can't prove the access is safe, then it adds runtime checks.

A form of this might make it possible to prevent use-after-free errors. If a function declares what it will do with a pointer, then freeing that pointer can be forbidden within that function.

Note: I'm no compiler guru. I'm just sharing what I learned from following Microsoft's Checked C project.

daurnimator · 2021-12-30T11:57:21Z

@Symbitic that just sounds like slices

whatisaphone · 2023-09-05T23:30:56Z

Another form of UB that isn't currently safety-checked is out-of-bounds indexing into [*]T. What if many-pointers had a secret length field (in safe builds only) that was used for bounds checking? This fits with several other proposals that add secret safety fields: #2414, #1119, #211

Half-baked proposal:

everything in this list applies to Debug/ReleaseSafe only
in safe builds, many-pointers gain a secret safety field for length, similar to how unions have a secret safety field for the tag
- like slices, the secret length field contains the number of bytes accessible starting from the pointer
- unlike slices, the length field is a compiler implementation detail which is not semantically part of the type, and it is not directly readable or writable from user code
many_ptr[i] and many_ptr[x..y] should use the secret length for bounds checks, just like slices
pointer arithmetic (many_ptr + x) will subtract x from the secret length, saturating at 0
creating a many pointer when a length is available (such as slice.ptr) should copy the known length into the manyptr's secret length
creating a many pointer when a length isn't available (such as coercing [*c]T into [*]T) sets the secret length to maxInt(usize), effectively disabling safety checks

I like to use many-pointers instead of slices when I know I won't need to access the length at runtime. But it's sad to have to give up bounds checking in order to do that. This would be the best of both worlds.

matu3ba · 2023-09-06T09:21:48Z

many_ptr[0..new_length].ptr

Quickly looking at many_ptr[0..new_length].len looks like a regular slice, which sounds like a potential footgun when the type is infered via comptime.

If the field is writable from regular code, people might rely on it accidentally and the code breaks in non-debug and non-ReleaseSafe mode due to it not being available.

Also nit: You forgot to mention that the secrent len field only available in debug+ReleaseSafe of a many-ptr is not-readable.

Please post (from your opinion) half-baked proposals not on the issue tracker, but in a community place instead.

whatisaphone · 2023-09-06T16:17:43Z

To clarify, the field being secret means it's not readable or writable from user code. It's not "available" in debug builds either, so I don't see how you could accidentally rely on it. The semantics of the type wouldn't change at all, outside of the new bounds checks (and Zig doesn't have catch_unwind to let the program observe those panics). Can you explain what you mean, maybe with a code example?

I think the idea behind a secret field has been pretty consistent across these issues, but you're right, I could have been explicit. I edited my post. Thanks for the feedback.

nektro · 2023-09-06T18:46:29Z

the secret length can be set directly with many_ptr[0..new_length].ptr – this would be a nop in fast/small builds

if new_length is comptime known then *[N]T is exactly the type you described and already implemented. if new_length is runtime known then you get a slice

whatisaphone · 2023-09-06T19:05:23Z

You both got hung up on a footnote within a footnote. I included that bit for thoroughness sake but now wish I hadn't, it's not important to the idea at all. I removed it.

Anyways in my code I find many pointers/slices are dramatically more common than pointers to arrays. That's true for the compiler/stdlib too:

# pointer to array
$ rg -t zig '\*(const )?\[\d+\]u8' | wc -l
92

# many pointer
rg -t zig '\[\*\](const )?u8' | wc -l
362

# slice
$ rg -t zig '\[\](const )?u8' | wc -l
2217

As you pointed out, the first, uncommon case already has a good solution. This proposal benefits the second and third, much more common cases where the length is runtime-known.

nektro · 2023-09-06T19:24:32Z

I suppose then you're proposing that the size of slices in non-safe modes is the same size of a pointer? (this also might already be the case given @typeInfo but im not 100% sure) because bounds checking is already disabled in those modes for slices. i didn't want to comment on too much because this issue is the wrong place to have this discussion

whatisaphone · 2023-09-06T20:55:57Z

Slices wouldn't change at all. I only mentioned them for comparison.

In fast/small builds, many-pointers also wouldn't change at all. Same thing at a source code level: there are no new programmer-visible semantics (besides panics, which in Zig are not meant to be observed anyway)

In debug/safe builds, the size of many-pointers would double, and a handful of builtin operations would use the secret field for bounds checking. But like Zig's other secret safety fields, the programmer can't see or access it no matter the build mode—only the compiler can during codegen.

The point is to have bounds checking safety in debug builds, and space efficiency in release builds.

	debug bounds checking	release sizeOf (x86_64)
slices	✔️ yes	❌ 16 bytes
many pointers (Zig 0.11)	❌ no	✔️ 8 bytes
many pointers (proposal)	✔️ yes	✔️ 8 bytes

(Should this have been a separate issue?)

Before we fixed this bug, every labeled piece of data was using the image of the 8 (whatever the last training item happens to be). We were referencing a stack allocated variable in the loop which just went to garbled memory after the loop ended (happened to reference the last item of the loop). It would be nice if Zig could warn/error when we tried to reference and store a pointer to some stack-allocated object to avoid this type footgun (ziglang/zig#2301).

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Apr 18, 2019

andrewrk added this to the 0.6.0 milestone Apr 18, 2019

This was referenced Apr 30, 2019

Support leak detection and race condition analysis #1199

Open

terminology update: use the phrase "detectable illegal behavior" rather than "safety-checked undefined behavior" #2402

Open

add safety checks for pointer casting #2414

Open

andrewrk mentioned this issue Jun 10, 2019

Add Compiler Error when returning pointer to stack variable #2646

Open

andrewrk mentioned this issue Aug 27, 2019

question: how much compile time memory safety will Zig provide? #3126

Closed

andrewrk mentioned this issue Sep 5, 2019

memory safety when use-after-free occurs of a local function stack variable #3180

Open

andrewrk added this to To do in Safety Oct 17, 2019

andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 10, 2020

andrewrk added the accepted This proposal is planned. label Oct 9, 2020

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021

kassane mentioned this issue Jan 25, 2022

Support generation of Zig bindings mozilla/cbindgen#732

Open

7 tasks

This was referenced Feb 20, 2022

Uncaught undefined behavior when returning pointers to stack memory. #10936

Closed

the dangling pointer is allowed. #10945

Closed

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

ominitay mentioned this issue Apr 17, 2022

Integer coercion consideres non-type bits #11263

Open

kuon mentioned this issue Jan 2, 2023

Warn/error when returning stack variable references #14156

Closed

andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023

squeek502 mentioned this issue Jun 7, 2023

Segfault in GeneralPurposeAllocator #15971

Closed

andrewrk modified the milestones: 0.13.0, 0.12.0 Jun 29, 2023

ifreund mentioned this issue Jul 21, 2023

Debug and ReleaseSafe different output #16467

Closed

rohlem mentioned this issue Sep 1, 2023

For loop range is unusable with types other than usize #17039

Open

rohlem mentioned this issue Oct 13, 2023

Proposal: disallow direct assignment to x.? #17508

Open

MadLittleMods mentioned this issue Oct 19, 2023

Fix MNIST example not training (pointer to de-referenced stack memory) MadLittleMods/zig-ocr-neural-network#12

Merged

Jarred-Sumner mentioned this issue Oct 20, 2023

fix findBestMatch so it finds the best match and not the first match oven-sh/bun#6611

Merged

4 tasks

Inve1951 mentioned this issue Apr 5, 2024

globals fields segfault (or write fault in os.zig) #19546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate checks for all forms of UB in Debug and ReleaseSafe modes #2301

Generate checks for all forms of UB in Debug and ReleaseSafe modes #2301

andrewrk commented Apr 18, 2019 •

edited

andrewrk commented Apr 30, 2019

ghost commented Aug 27, 2019

nmeum commented Apr 22, 2020 •

edited

ngrilly commented Mar 16, 2021

matu3ba commented Mar 23, 2021 •

edited

matu3ba commented Mar 23, 2021 •

edited

Hadron67 commented Apr 3, 2021

matu3ba commented Apr 3, 2021 •

edited

matu3ba commented Apr 6, 2021 •

edited

Hadron67 commented Apr 7, 2021

matu3ba commented Apr 8, 2021 •

edited

oconnor663 commented May 3, 2021 •

edited

pfgithub commented May 4, 2021 •

edited

oconnor663 commented May 4, 2021 •

edited

pfgithub commented May 4, 2021 •

edited

matu3ba commented May 11, 2021 •

edited

Symbitic commented Dec 30, 2021

daurnimator commented Dec 30, 2021

whatisaphone commented Sep 5, 2023 •

edited

matu3ba commented Sep 6, 2023 •

edited

whatisaphone commented Sep 6, 2023

nektro commented Sep 6, 2023

whatisaphone commented Sep 6, 2023

nektro commented Sep 6, 2023 •

edited

whatisaphone commented Sep 6, 2023

Generate checks for all forms of UB in Debug and ReleaseSafe modes #2301

Generate checks for all forms of UB in Debug and ReleaseSafe modes #2301

Comments

andrewrk commented Apr 18, 2019 • edited

andrewrk commented Apr 30, 2019

ghost commented Aug 27, 2019

nmeum commented Apr 22, 2020 • edited

ngrilly commented Mar 16, 2021

matu3ba commented Mar 23, 2021 • edited

matu3ba commented Mar 23, 2021 • edited

Hadron67 commented Apr 3, 2021

matu3ba commented Apr 3, 2021 • edited

matu3ba commented Apr 6, 2021 • edited

Hadron67 commented Apr 7, 2021

matu3ba commented Apr 8, 2021 • edited

oconnor663 commented May 3, 2021 • edited

pfgithub commented May 4, 2021 • edited

oconnor663 commented May 4, 2021 • edited

pfgithub commented May 4, 2021 • edited

matu3ba commented May 11, 2021 • edited

Symbitic commented Dec 30, 2021

daurnimator commented Dec 30, 2021

whatisaphone commented Sep 5, 2023 • edited

matu3ba commented Sep 6, 2023 • edited

whatisaphone commented Sep 6, 2023

nektro commented Sep 6, 2023

whatisaphone commented Sep 6, 2023

nektro commented Sep 6, 2023 • edited

whatisaphone commented Sep 6, 2023

andrewrk commented Apr 18, 2019 •

edited

nmeum commented Apr 22, 2020 •

edited

matu3ba commented Mar 23, 2021 •

edited

matu3ba commented Mar 23, 2021 •

edited

matu3ba commented Apr 3, 2021 •

edited

matu3ba commented Apr 6, 2021 •

edited

matu3ba commented Apr 8, 2021 •

edited

oconnor663 commented May 3, 2021 •

edited

pfgithub commented May 4, 2021 •

edited

oconnor663 commented May 4, 2021 •

edited

pfgithub commented May 4, 2021 •

edited

matu3ba commented May 11, 2021 •

edited

whatisaphone commented Sep 5, 2023 •

edited

matu3ba commented Sep 6, 2023 •

edited

nektro commented Sep 6, 2023 •

edited