Support switch on string variable

whoops! Thank you friend.

I definitely prefer Zig’s more manual, explicit and clear style, over something that looks fancy, but hides complexity from you, by adding a mountain of constructs.

cool, I def see that.

Each question has a question behind the question.

I personally look at each character in code as labor and cost - the more you have, the more expensive it is. I want the tightest code base possible

Code is Cost
without some form of sugar, string comparison will require more time when there is a change to what you are comparing – the developer (or an AI, one day) will have to mindlessly propagate updates thru if then else conditions.

When a language says “solve this yourself”, it creates tribes. And macros, infamously, create tribal sprawl.

That’s why I think switch (string) is really a request for a better if then else chain. People just want something simpler

What do people desire?

People want clarity, and overly explicit code can be more unclear – ex, type inference is preferred const a = "abc" ++ g_str_suffix even though there is compiler magic. Zig does great.

People want zig fmt and ZLS to help solve problems - ex ZLS will render the implicit types above. Zig does great…

to me the ask has less to do with strings, and more to do with complex conditionals. look at current state:

if cmplx-left-a == right-a { 
   ...
}
else if cmplx-left-a == right-b or cmplx-left-a == right-c or cmplx-left-a == right-d { 
   ...
}
else if cmplx-left-a == right-e { 
   ...
}
else {
   ...
}

when there is a change to cmplx-left-a, it a PIA.

consider the below…

if cmplx-left-a == right-a { ... }
             or == right-b or == right-c or == right-d { ... }
             or == right-e { ... }
else {
  ...
}

A. zig fmt and ZLS can help structure and format the code, revealing inferences to devs
B. if you need to update cmplx-left-a, you do it once.

That is weird syntax that doesn’t follow any of the existing rules of the language reusing or as syntactic marker instead of operator.
It is basically a match statement that requires you to repeat or everywhere where you refer back to the cmplx-left-a value.

If I were to re-imagine a match statement I would write it like this:

match (cmplx_left_a) {
    == right_a => {},
    == right_b, right_c, right_d => {},
    == right_e => {},
    else => {},
}

This syntax is also different than existing things, but I think it has at least a little bit more consistency with existing switch cases.

Then if it allowed == <= >= < > != and comptime known 2-argument functions that return a boolean, it could be neat. (But it would be the first time that a control structure can be parameterized with comptime functions):

const eq = std.mem.eql(u8); // maybe allow currying for comptime parameters?
match (cmplx_left_a) { // assuming these are all string variables
    eq right_a => {},
    eq right_b, right_c, right_d => {},
    eq right_e => {},
    else => {},
}

While I think something somewhat like this could eventually be added to Zig, I am not holding my breath / seeing it as something that has a high likelihood.

My personal stance on fancy features like this is that I think it is fine if Zig never gets these, because I think eventually the people who want to use these features will end up creating their own dialect scripting language anyway and with the build system it should be possible to combine parts written in that dialect with regular Zig.

2 Likes

First, thanks for all the hard work on zig! zig is truly amazing.

Real world use case - net / TCPIP packet parsing, where there is one-off quiescence of string input to an enum literal.

It is super convenient for middleware libs, so teams can tuck in the ‘mini scenarios’ w/reduced maintenance,

Given a "low level’ TCP/API response (essentially, a string)…

  • The API response (a string) might be populated with convention without standard - even given an RFC - with slightly different vendor responses for the same response concept, so teams are continually learning what the full set of responses are, over time.

  • it can be really handy to pull the string comparisons out of enum land and lay it into the quiescence logic, connecting the string thing to the enum result.

  • Ultimately, enum enforcement is the goal, so one may not want to keep even temp enums around.

func parseApiRv(s:String) -> [ApiRecords] {

// split s into [records]
// ...each record containing one or more  [key/key metadata data/value] data
// ...said data is identified by strings themselves.

switch (key) {
  // Identify File Type
  case "type":
     switch (value) {
       case "file":
         rec_type = .FILE
         file_ext=(file_name as String).pathExtension.lowercased()
                                
          switch (file_ext) {
             case "a": file_type = .A
             case "b": file_type = .B
             default: file_type = .UNKNOWN
           }

        case "dir":
          rec_type = .DIR
          file_type = .DIR
          file_ext=""  ...

        default: append = false
     } 
  case "size", "sz", "sized":
    ...

there are other ways to skin this cat but wanted to share the use-case for consideration.

Most needs for mapping strings (to enums or bits of code) are better accomplished using some kind of lookup (hash table, trie, radix tree, depending on how variable the domain of strings are, and whether they are predetermined). Often the lookup table can be generated at comptime, and in fact Zig has things to accomplish this.

Other languages hide the construction of those lookups behind switch syntax and pattern matching. However, there has been a lot of experience with the benefits from Zig not hiding complexity. There is no good way to know how or control how the language will transform the switch into some lookup/dispatch construct, so you may do something that appears simple syntactically, but then generates extremely expensive or non-performant. When that happens you’ll need to decide on how the lookup is best implemented for your data anyway.

6 Likes

Agree, There’s also the consideration that Zig aims to be general purpose. It’s cool if you can afford to make syntax level switch on strings and it probably doesn’t matter as much what the codegen is for x86_64 or Aarch64, but what about AVR? Arm? Pic? Risc-V etc? There are places where low complexity upfront (aka in the language) pays dividend. Zig is right to be very adamant about being KISS friendly.

1 Like

true - utf8/16 and u64 are also rough in hw limited scenarios. For ex, for my zig playgrounds, I gave up on 32-bit compatibility–for now; there is just nobody there to use them, and those that do, I presume, know how to get things working.

HW limited scenarios seem to need proprietary / tuned logic for specific scenarios, no matter what, so it seems better (for me) to focus on the bulk of the use-case: 64-bit ARM / x86. I am having problems just getting f(x)=n to produce n on Linux v MacOS v Windows consumer hw! :slight_smile:

I don’t know that I would make grammar decisions based on the smallest unit of compute - not that I would exclude small compute, but when there is a workaround for small and sweet sweet sugar for majority…

At the end of the day, most developers are comfortable abstracting decisions to the compiler. It is the evolution of ASM->C->C+±>GC grammars. Most realize under the hood, decisions get made, and in this thread are a set of solutions including nil; none need be approached hastily.

For me - I wouldn’t want people solving problems compilers can solve better and more efficiently.

1 Like

For a lot of problems it is not at all evident that the compiler is actually reliably doing a better job, it is just that some languages simply don’t care, those languages would rather just give you something so people stop complaining, no matter whether they will end up writing and using worse code and resulting programs.

Yes plenty of programmers are fine with creating pessimization, but that doesn’t mean that it is good that many would rather “never had to think about it”, then be inconvenienced with the choice. Some of the early users of computers would probably weep if they saw how inefficient and wasteful today’s computers are used on average. Where they had to wait mere seconds for the software to finish, today it takes minutes, doing more unnecessary things.

I am tired of hearing this “many probably don’t care” as if it was an argument for anything.

Is it great that more people can create software? yes
Is it problematic that less people on average know or care to create well performing software? yes
Is it great that I can play some game written in python? yes
Do I want to wait 2 minutes before it opens, because it is terribly written? no

This whole idea of the compiler is smarter, is flawed, the compiler is only as smart as the compiler team was able to predict specific scenarios and make certain tradeoffs and it needs to work a whole lot harder, than the programmer who writes a concrete program, because the latter always knows more (or at least has the potential to know more). Also if you make your compiler too smart, than it takes ages until it is done compiling and it may not even give you better code (maybe sometimes but not always).

6 Likes

All good points … questions to consider

  1. How long does it take to teach and educate?
  2. How much does it cost to keep?

A key input in industry function is time t, a key driver in resource cost c.

How long does it take to hire a skill? How long to skill an unskilled hire? How long until a skilled hire is proficient? How long until feature a is producing value, and was the value worth the sunk time t → cost c ?

People are the key; at scale, least-common-demoninator (lcd) wins, regrettably. 3, 30, 300, 3000 → 3M… the lcd is a sliding scale, growing ever more least with each scale factor. Think adoption chasm.

These problems existed 100% in the early days of computing, where complex obscurity (even if better) is replaced with simple clarity (even if worse); 100% the reason why platforms have evolved, and why increase in compute is rapidly consumed by platforms that reduce t / c.

1 Like

You make some good points, but you also have to keep in mind the inital goal and mission statement of Zig, which is to make a better C. As such if you want to be a better C you have to be a better C wherever C already is, which means embedded can’t be elided just because it’s quite niche.

If Zig wants to be a better C (which it already is in many regard) it has to be a better C everywhere. So while switching on string is convenient, it’s also not that big of a deal, and nothing prevents you from emulating that in userland by using a hashmap or a StaticStringMap.

Also one of the perk of keeping things simple is that it will help Zig in the long run to compete with C. Because if the language remains simple, than it will be very easy to implement a Zig compiler, just like right now in about 10/15k loc you can implement a rudimentary C compiler. It would be cool if you could do the same in Zig.

Because at least in embedded if you want to succeed, your language has to be simple, but your compiler too, because there are many software that depend on your language not being too complicated, things like proving software, static analyzer etc.

3 Likes

This doesn’t happen to be one of those cases. The myth of the ‘sufficiently smart compiler’ is pernicious and we need to put it out to pasture.

LLVM, which is the best we have, is not ‘sufficiently smart’ to do an optimal, or even good job, with complex switches on numeric values. As a simple example, in ezcaper, I broke up a switch into two levels based on powers of two, something I know computers are good at, because the ASM for one huge switch statement was terrible. LLVM wasn’t able to figure out “hey let’s start by doing a clz and making a jump table, then we can optimize each power of two a bit better”, I had to do that for it.

My solution also is far from optimal, it hits a balance between good performance and one-pass generation from the Unicode database with no further tweaking. There’s a deep literature on how to switch on Unicode and I’ve done original research there. Just implementing the stuff is mindbending, imagine adding optimization passes which determine that one approach or another is optimal.

That’s not even possible in the general case, because the optimal approach for so many algorithms depends on the data, not the code. That’s definitely the case with switching on strings.

What Zig does is offer a good all-rounder algorithm in the form of a StaticStringMap, but is this optimal? Not always! There are dozens of data structures for the purpose.

That’s the best a compiler could do: offer syntax sugar which always compiles to a StaticStringMap. That’s bad, because baking it into the syntax of the language strongly biases using it, even when it’s not appropriate.

There are languages with different priorities, where doing that makes sense. In my opinion we have enough of those, and Zig is staking out a position in a valuable and under-resourced area of the programming language state space.

5 Likes

If Zig wants to be a better C (which it already is in many regard) it has to be a better C everywhere.

Excellent point…I try to be mostly mindful of that goal when posting here :slight_smile: but not always.

String handling is one of those soft boundaries. What you are really saying, I think, is the problem is bigger than strings.

Obv, we don’t want to go back to strcmp.

Zig’s better C is how the grammar aligns concepts:

if ( type_str == "file 🥰" ) {...} 
if ( type_int == 999999999 ) {...}

However, it is natural for folks coming from the gc world to wonder.:

//can do
switch (type_int) { 
    999999999 => { ... }
}

//can't do 
switch (type_str) {
   "file 🥰" => { ... }
}

I was here. Personally, the question of switch ([]u8) seems like a great roadmap item.

  • Do I think switch (str) is the HIGHEST priority for zig? Of course not.
  • Do I think switch (str) as sugar for if=>else chain is terrible for MVP? No.
  • Do I think switch (str) as sugar for if=>else chain is best? Nope.

As the grammar matures, as the compiler performs, might be good to put the problem on the roadmap for a future solution. You raise an interesting point. I think someone else said…it’s bigger than strings.

A switch statement is a selection control algorithm:

  • Short hand to connect an expression e,
  • to logic l
  • when e is exactly one or more potentials p per some operator (==) .

For example, given a complex type t, perhaps the question is - how could zig solution the problem itself - in the same way one has the choice of different allocators, why not the choice of different selection-control algorithms? Take some of the work out of the compiler itself, into code/libs/so/as that are then compiled.

Suppose, we had a keyword via to indicate, for this switch, use the control methods defined in some structure to do the work

if (big_os) {  // functionally unlimited ram/cpu  O(1)
  algo=std.str.switch.hash(some_alloc);
  defer algo.deinit();
} 
else { //embed case O(n)
  algo=std.str.switch.chain;
}

switch (type_str) via algo  { ... }  // this code works on all platforms

then, if you had some other t.

my_type_algo=my_type_aglo_init_fn(allocator,p1,p2);
defer my_type_algo.deinit();

switch (my_type) via my_type_algo {
     type_literal_1, type_literal_2 => { ... } 
}

Designing this I would presume to be bigger than a breadbox; there are a lot of magic black boxes in the above. But, it would be pretty tight…portable? Too much magical thinking? Would it be better for embed cases?

Right at the top of ziglang.org, the very first bullet point reads:

  • No hidden control flow.

That would be hidden control flow. It would mean that any time I’m reading a switch statement, I have to stop what I’m doing and go read some specialized via statement to figure out what’s going on. That isn’t realistic, so just like operator overloading, I’m just going to assume that the implementer knows what he or she is doing, and I’m going to be wrong, and there will be pain.

Only function calls are function calls in Zig. That’s a profound choice with many consequences, it’s not without its tradeoffs, but I consider it basically correct, and if I didn’t, I would need to use another language, because it’s fundamental to this one.

3 Likes

I mean there are valid reasons why you would want that. I guess all that I’m trying to say is that Zig isn’t a language where it would make a lot of sense. Here, the switching on string topic is not even that important to the more profound discussion.

The point is, if you try to do everything at once, you ain’t going to do each thing well, and you will end up becoming C++. Aka, a convoluted mess of vision lacking features stitched together that pretends to be a language.

Instead, I feel like it’s better to have languages that, while retaining some aspect of “general purposefulness”, are specialized. So, if you want switch on string, then Rust is a great alternative. Go too.

But the value proposition of Zig is a low complexity language that extends the C philosophy of being as close to the machine as possible, while offering conveniences that don’t impact your ability as the programmer to control what your program is doing.

This is the most compelling feature of Zig and why I enjoy it more than Rust (not to criticize Rust in the slightest, I love it as well). Because in C the compiler trusts you way too much and it’s too easy to mess things up. In Rust it’s very hard to mess things up, but the tradeoff is that it really requires a lot more planning ahead. If that makes sense, Zig is that perfect middle ground (at least for me) where the language is very strict about the type system, but if you really want or need to, you can bypass basically all of the safety and protection and go back to doing C.

If Zig were to implement switching on string, or any other very high-level features, suddenly I lose control, and I lose the ability to be C again. Because I don’t know what the compiler will do for me; it might be optimal machine code, but it might also not be it might be really good code on x86 but completely explode the bin size of AVR.

What’s great about Zig is its vision and dedication to be its own thing. This is why I’m more confident in its long-term success and why I invest early in learning it. But if it starts to become a C++ bis where a ton of features are implemented for the sake of it, without thinking about the overall vision and long-term goals of the language, then there will be no point to Zig.

4 Likes

cool, that is an interesting take. As a thought experiment, how about the below?

I’m reading an alloc statement, I have to stop what I’m doing and go read some specialized allocator description

Why is a switch algorithm mechanism less transparent than allocation? Isn’t switch, at some level, hiding control flows? It kind of is…right? That’s the point - we don’t want to know.

Zig is going to encounter this question countless times - the GC world (py/swift/c#/go/java) et al.

Where does one optimize selection control? If a compiler function, then that is one answer. Or, is it in user-land, then that is another answer; I think the best case is a little of A, a little of B.

Selection control does seem like a decidable problem, one that could be defined to be bigger than strings/arrays, or narrowed to it, one that is sub-optimal in C (for sure) and one could argue could be better in zig (nobody has said - this is a terrible idea, it’s the implementation deets that is the convo focus).

That’s not the problem here. The problem is that adding this would result in another kind of function call, instead of precisely one. We all know anything goes inside a function call. But in Zig, a dot access is never a function call, addition is never a function call, try is never a function call, assignment is never a function call, and a switch is never a function call.

That’s not a negotiable premise in this language. It just isn’t.

3 Likes

If Zig were to implement switching on string, or any other very high-level features, suddenly I lose control, and I lose the ability to be C again.

What if it was your switching algorithm? Would that be less interesting, or give you less control?

Interesting take.

But the value proposition of Zig is a low complexity language that extends the C philosophy of being as close to the machine as possible, while offering conveniences that don’t impact your ability as the programmer to control what your program is doing.

For sure…thank you for sharing.

Ultimately, for me…Zig has this collection of sweetness (str=="a 🥰"), and then there are these random sharp edges (switch (str)) … hmm. Not a boo. Just a hmm.

I definitely see the future of computation as being less human. Where compilers like zig are written not for people, but for machines, and people write looser and looser specifications to be interpreted by the machine, for the machine to compile, for the machine to verify, for there person to validate - a world where there is much less control. At some level, what better programmer of computation than the computer itself?

This world is not one where control will be a feature, but a bug. So I understand why control is important.

I definitely do have an affinity for Swift … it is one of the sweetest grammars there are. And has some annoying qualities (“this expression is too complex” → WTH). So I see zig, I think hmm, what would soften some of my perceived sharp edges. Well, I know that cake over there…pretty yummy! :slight_smile:

1 Like

That’s not the problem here

cool beans…what’s your take? How do you see if / else if / else if / else chains?

I see them as bug prone and problematic.

I have to be honest, I am not sure I am tracking the function call analogy, and that is 100% a limitation on my end.

Like, I look at the below…what is happening? How does this work?

 switch (x) {
        -1...1 => {
            x = -x;
        },
        10, 100 => {
            //special considerations must be made
            //when dividing signed integers
            x = @divExact(x, 10);
        },
        else => {},
    }

I know I like the above a lot better than C. It feels real tight. However, I am really not tracking why arrays and structures are out / unsolvable, while primitives are in…I feel like I am missing something big.

1 Like

Maybe not missing, but you should try to godbolt this kind of switch statement and some string matching or rust match statement, and you will see the difference, I’m not saying it’s good or bad, to each their own, but something like ranges of integers, often compiles down, to a jump table or some bit shifting with simple mask, something anyone programming in assembly would probably do anyway, it’s very intuitive, whereas with string, there are many open ended questions that you can’t necessarily reason about locally.

Like this kind of switch statement on ranges, is very intuitive for anyone experienced in C like language, you can guess what it will look like in the end. With strings it’s not the same, does it optimizes for small strings by representing the first few bytes as an integer for quick comparison ? Does it call a strcmp intrinsic ? does it matches on ascii characters ? utf8 ? unicode ? what about when there is a mix of format ? there are a lot of questions that you simply can’t answer just by looking at it.

And while I would agree that in 90% of cases it’s not going to be your application bottleneck, and it won’t matter, in some cases it does, and when it does, than that immediately disqualify the language construct, which means there’s no longer one obvious way to do something, because now you have to considered the impact of one construct vs another.

4 Likes

No it doesn’t, you can’t:

pub fn main() !void {
    const str = "hello";
    if (str == "world") {
        std.debug.print("yay!\n", .{str});
    }
}

Output:

temp22.zig:3:13: error: cannot compare strings with ==
    if (str == "world") {
        ~~~~^~~~~~~~~~
5 Likes

thank you Sze! oh my goodness … apologize to all for wasting our time … for some reason I thought zig could compile str=="world" I am so embarrassed!

the fact that it doesn’t … means a switch on []u8 does not have a natural place.

1 Like