I think it would work better if when self-modifying, it ran simulations on the result with its current utility function, but there would be a problem if it did end up modifying the utility function or modified away the simulation requirement.
It's hard to balance the ability to create new, more useful utility functions with prevention of creating a utility function at odds with what the original entity valued.
> It's hard to balance the ability to create new, more useful utility functions with prevention of creating a utility function at odds with what the original entity valued.
I think that for such an AI, the concept of a "more useful utility function" would be a nonsense. The AI's definition of 'useful' is the utility function. No other utility function can ever rate higher against the current utility function than the current utility function does.
It's hard to balance the ability to create new, more useful utility functions with prevention of creating a utility function at odds with what the original entity valued.