Its easy to see how for negative numbers the softmax operator could simply refrain from making a decision
e.g. ``` sum(softmax1[-100, -100, -100]) ~= 1e-43 ```
But is there any basis to assume commas and whitespaces will be negatively correlated with other tokens?
Its easy to see how for negative numbers the softmax operator could simply refrain from making a decision
e.g. ``` sum(softmax1[-100, -100, -100]) ~= 1e-43 ```
But is there any basis to assume commas and whitespaces will be negatively correlated with other tokens?