Hi, I’m Ted and I’m new here! I love this initiative and watched a very nice Youtube talk by you guys and I got the impression that some of you may be interested in helping out when it comes to writing a proposal for a new feature for inclusion in some future standard.
I’d welcome some help with this idea of adding a number of erase-remove algorithms that aren’t stable. I’ve never written a C++ proposal before.
I wanted this some years ago when was shuffling data back and forth with some filtering an really didn’t care about in what order the data was - and I didn’t want to have to pay for the extra moves that comes with keeping it stable, so I wrote a basic implementation of unstable_remove_if which can be used to implement the misc. container specific unstable_erase_ifs etc. It guarantees at most as many elements moved as the predicate requires to be removed while the stable removal has to move all elements after a removed element.
I’m mostly in C++17 so perhaps a feature like this is obsoleted by filter in ranges, but physically removing elements is probably also going to be necessary in the future.
The code I’ve written is absolutely not in shape for being included in any library as-is. I wrote it for my specific needs back then, but I think it could be a good alternative to the stable erasure that is the goto solution right now, so if it sounds interesting, I (or we) could pick it up and finish it along with a proposal.
Please let me know if you think it’s worthy of spending some energy on.
Can you elaborate on what unstable_erase_if does? remove_if is described on cppreference.com as doing this:
Removes all elements satisfying specific criteria from the range [first, last) and returns a past-the-end iterator for the new end of the range.
3) Removes all elements for which predicate p returns true.
What would be a likewise description of your unstable_remove_if?
Let me take a crack at that based on quick look at the code and the example. It’s the same functionality as erase_if and remove_if without maintaining the requirements for a stable algorithm
As a result, in the sample below at least there’s far less moving of data (2 moves versus 23 moves) since the last element of the range can simply be swapped with the removed element.
If you used views::filter here with ranges::to you’re going to get a copy for every char that doesn’t meet the criteria - like the stable range. So this is likely even less efficient because of extra allocations.
I think it’s interesting. I guess the question really is how common is the use case – would enough users find interesting to have in the standard library. One question that every proposal has to answer is is this something that belongs in the standard. With the general question of useful algorithms that question was long ago answered: yes. Whether this particular algorithm is important enough to include is probably a question only the committee can answer.
My personal take is useful generic algorithms is the bread and butter of the standard library since they simplify code development in every domain c++ covers. At c++now writing algorithms is a perennially popular topic for Library in a Week – because it’s small enough to get something real done in a short time – but super educational because writing algos is way harder than first glance.
Note that some of the committee have outlined what they would like to see in algo/ranges in P2760. Most of this plan will not be achieved as only a few ranges views have been proposed and design freeze for 26 is a couple weeks out. Going forward I’d like to see Beman be a hot-bed for this part of the library.
So @TedLyngmo if you want to go down this path, buckle in for a potentially long ride. Starting now, we’d certainly get some of the evolution groups to look at this in the next 1.5 years or so. Myself, David and others here can teach you the standardization ropes.
I’m a bit ashamed to admit it, but I created a few simple test cases to be able to show what would be gained by using the set of unstable_* functions since I didn’t have my old tests saved.
I’m not sure what’s changed since I wrote the functions, but their performance is really bad. Only with very contrived data sets will they outperform the “stable” versions. I assume cache locality explains most of the difference. I must have been testing them with some bias back then or used an ancient computer
Anyway, I no longer think they are such a good idea.
idk how long ago this was, but I can say unequivocally that compilers and libraries are under constant improvement. In my experience it is difficult, at best, to beat the smart folks that spend their employed lives doing this. They have all the information about what doesn’t work (bug reports) and have seen what does and hence are continually improving.
That said it can also be the machines – branch prediction, cache locality, etc. The improvements are relentless and impressive.
It’s a fair conclusion, but of course we’d love to have you stay and participate in other development. If not, thanks for now
I think unstable_remove_if is essentially the same as std::partition with the function result inverted. Credit to Dave Abrahams for pointing this out to me.
Hmm, yeah Dave is right – goes to show that even once you think you understand the library you might not. When prompted properly the statistical mechanical turk basically gets it right.
chatcpt: compare std::ranges_remove_if with std::ranges::partition
Comparison Summary
Feature
std::ranges::remove_if
std::ranges::partition
Purpose
Moves unwanted elements to the end
Moves elements to separate groups
Removes elements?
No, requires erase to shrink
No, just reorders
Order preserved?
Yes
No (unstable)
Return value
Iterator to logical end of remaining elements
Iterator to partition point
Efficiency
O(n) moves/swaps
O(n) moves/swaps
Use remove_if when you want to filter out elements while keeping order.
Use partition when you just want to separate elements into two groups.
Yes, partition is much like unstable_* except that unstable_* does not promise anything about the moved from’s state other than it’s valid in some sense. Moving from instead of swapping with should be cheaper. Learning from my previous assumption, I’m not going to say that that’s how it is though
I do follow the evolution of the implementations. I follow it mostly on the library side and I do my bit of reporting on behavior that’s not to be expected to the “big three”. I also looked at the actual implementations alive to see what cleverness they have going on that could beat my naive “move as few as possible” idea, but no, there is no fancy stuff. It is straight forward staying in line - and that’s why I believe it being pure hardware that makes that efficient. My algorithm would maybe have made sense before cachelines.