Go enjoy Python3

crawshaw · on Aug 27, 2015

There are several ways to solve this in Go. The first that comes to mind, assuming you want to truncate to the first 12 runes, not bytes:

        func main() {
            v := []rune(os.Args[1])
            if len(v) > 12 {
                v = v[:12]
            }
            fmt.Println(string(v))
        }

Or more in the spirit of the C example in the post:

        func main() {
                res := make([]rune, 12)
                copy(res, []rune(os.Args[1]))
                fmt.Println(string(res))
        }

Note that res will stay on the stack, just like C.

I expect the author is trying to say something about Go that I'm not quite getting. Perhaps that it is not an expression-based language, so to make code readable you need to make use of multiple statements. That's by design, but I understand it may be unappealing if you want to program in an expression-heavy style.

jerf · on Aug 27, 2015

"I expect the author is trying to say something about Go that I'm not quite getting."

I assume "Go sucks because, look, this one weird case is a bit ugly." (that is, as rhetoric, not dialectic; it is not literally claiming "one case bad" -> "Go is bad" in the logical sense.) A weird case that I've programmed many thousands of lines of Go code in but never once encountered. Taking a slice out of a string blind like that is actually a bit rare; usually in some way it turns out you actually have length information somewhere in the environment. It's hardly like "slice index out of bounds" is some sort of terrible error... it is, at least, arguable that Python is in the wrong here for being so willing to return a string generated by [0:12] that is not 12 bytes/characters in length, which seems like a reasonable assumption to make of such an operation.

Now, if we want to talk about little examples like this, let's talk about sending on something like a channel in Python, to say nothing of Python's implementation of the "go" keyword... oh, yes, I see, suddenly this is an unfair way to compare languages.

Yes, it is.

bsaul · on Aug 27, 2015

This posts shows two very common issues that programmer have with the GO language when they start using it (that includes me), especially since go is advertised as compiled with the feeling of a dynamic language :

A low-level feeling when manipulating arrays (or slice), and a poor support for generic functions ( that would be math.min in this example).

jerf · on Aug 27, 2015

If it said that explicitly, I'd be fine with it.

But given the last paragraph, I don't think that's the most likely interpretation.

And it's still a terrible way to judge languages without a lot more context. All langauges have gotchas that fit into 3-5 lines. Python's got a pretty decent set: https://www.google.com/search?q=python%20gotchas It's still a good language.

And let me be very clear: I'm not "defending" Go here... I quite like both Python and Go. I've got no trouble saying Python is incrementally easier than Go when it comes to dealing with strings (but both are beat by Perl). (Especially since the incremental advantage comes at a stiff performance price. Sometimes that's fine, sometimes that's not.) I'm specifically saying as computer language polyglot, this metric for measuring languages is terrible. It's a rationalization, not a rational argument.

bsaul · on Aug 27, 2015

I see your point, but after having coded a full (minor) project in Go, i can assure you that those two points alone (cumbersome array data structure and lack of generic code) made me rethink twice about using this language for the common "web service for CRUD to DB" use.

Then i tried to see how did go data access layer libraries look and it finished to convince me not to use it unless performance and memory usage were a crucial matter.

rdtsc · on Aug 27, 2015

> , let's talk about sending on something like a channel in Python:

  import Queue; q=Queue.Queue(); q.put(1)

> to say nothing of Python's implementation of the "go" keyword...

Why would Python have a go keyword? Go doesn't have the "except" keyword that Python has, not sure what the point it?

pekk · on Aug 27, 2015

Go is frequently presented as a replacement of Python. When people hear that, it sets up an expectation that Go will have the same pleasant qualities of Python, when it doesn't, any more than Python has goroutines.

nkozyra · on Aug 27, 2015

Go is more frequently presented as a replacement for C++/Java with a syntax that feels more like an interpreted language like Python or Ruby.

I find that to be totally true. It's certainly lighter to write in than C++ or Java, "go run" effectively feels like running the interpreter and eschewing {} and ; lends to the latter, as well.

And Python has concurrency options as well - "goroutines" is, obviously, relegated to Go.

pekk · on Aug 27, 2015

Interviews with Pike et al. have always made clear that Go actually was made to compete with Java and C++. I wouldn't argue with that.

But the rank and file as represented on HN among other places presents Go as a replacement for Python all the time. It's one of the most common memes about Go. And this sets up expectations Go wasn't designed to fulfill. When a new user honestly reports that Go doesn't fulfill those expectations, we yell at him as for making a dishonest and unfair comparison when really, we set up the dishonest and unfair comparison ourselves when we promoted Go as a replacement for Python. As long as we continue to promote Go that way, we should expect people to compare them, and we shouldn't yell at them for making honest reports that Go and Python are different in ways they are designed to be.

derefr · on Aug 27, 2015

I don't think anyone explicitly marketed Go as a replacement for Python; instead, Go was instead marketed as, for some use-cases (low-level-ish software) what you should have been using in the first place—places where you should have been using C++/Java, not Python, but where Python was used anyway because the alternatives were too unwieldy.

Jabbles · on Aug 27, 2015

fmt.Printf("%.12s", os.Args[1])

johannesboyne · on Aug 27, 2015

+1 for simplicity

pjmlp · on Aug 27, 2015

I assume it has to do with Unicode support.

masklinn · on Aug 27, 2015

> Simple enough, in essence given first argument, print it up to length 12. As an added this also deals with unicode correctly

That's not true, Python 3 uses codepoint-based indexing but it will break if combining characters are involved. For instance:

    > python3 test.py देवनागरीदेवनागरी
    देवनागरीदेवन

because there is no precombined version of the multi-codepoint grapheme clusters so some of these 10 user-visible characters takes more than a single you end up with 8 user-visible characters rather than the expected 10.

edit: the original version used the input string "ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ" where clusters turn out to have precomposed versions after all. Replaced it by devanāgarī repeated once (in the devanāgarī script)

Veedrac · on Aug 27, 2015

The easy Python way:

    import sys
    import regex
    print(regex.match("\X{,12}", sys.argv[1]).group())

with the regex[1] package that should be in the stdlib Any Day Now™.

[1]: https://pypi.python.org/pypi/regex

Spiritus · on Aug 27, 2015

Interesting, I had no idea the `re` module was getting revamped. Scheduled for 3.5 or later?

Veedrac · on Aug 27, 2015

Certainly not 3.5, although a few years ago I would have told you almost the exact opposite.

I wouldn't hold your breath. The issue tracker[1] suggests 3.7 or 3.8 as optimistic. Guido made some comment somewhere relatively recently, but I can't find where. It's entirely possible it will never actually happen; time doesn't seem to have made people more enthusiastic.

It's a shame, because the new module is awesome.

[1] http://bugs.python.org/msg230846

stevenbedrick · on Aug 27, 2015

Yup. A long time ago, while working on a project with some particularly gnarly Unicode issues, I got in the habit of thinking in terms of grapheme clusters instead of code points (or "characters", for whatever definition of "character" one wishes to use), and it has served me very well. Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

Ruby's unicode_utils gem has a nice implementation of the standard grapheme cluster segmentation algorithm, and Python's wrapper around ICU works quite well. Go's concept of runes is certainly an improvement, but it doesn't handle combining characters out of the box...

masklinn · on Aug 28, 2015

> Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

The good news is Unicode 8 will make them way more frequent! (alternate emoji skin colors are specified via combining characters) much as Unicode 6 made astral characters way more "in your face" (by standardising emoji in the SMP)

bmn_ · on Aug 27, 2015

Languages that cannot deal with graphemes are lame. I daresay this solution below should score 20 in OP's imaginary scale.

    $ perl -CADS -E'say $ARGV[0] =~ /(\X{5})/' देवनागरीदेवनागरी
    देवनागरी

Length of input string is: 10 graphemes, 16 codepoints, 48 octets (UTF-8).

Length of output string is: 5 graphemes, 8 codepoints, 24 octets (UTF-8).

hahainternet · on Aug 27, 2015

That's a shame, it works as you'd expect in perl6:

  sub MAIN($s) { say $s.substr(0,12) }

  $ perl6 test.p6 ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ
  ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇ

masklinn · on Aug 27, 2015

Turns out there are precomposed versions of these clusters, so your system might just be using these.

Could you retry with the input "देवनागरीदेवनागरी"?

hahainternet · on Aug 27, 2015

I'm not quite sure how to interpret the output as it doesn't render particularly kindly in my terminal:

  sub MAIN($s) {
  	say "{$s.chars}: $s";
  	my $b =  $s.substr(0,12);
  	say "{$b.chars}: $b";
  }

  $ perl6 hn-test2.p6 देवनागरीदेवनागरी
  16: देवनागरीदेवनागरी
  12: देवनागरीदेवन

masklinn · on Aug 27, 2015

So apparently perl6 is also "wrong" and operates on codepoints, your system composed my original string and each (base, diacritic) pair was pasted as a single precomposed character (I expect that if you try out the Python version on your system you'll also get the "right" answer).

The new string is composed of 10 user-visible characters (5 character repeated twice) but 16 codepoints (and this time I carefully checked that there was no precomposed version):

    DEVANAGARI LETTER DA
    DEVANAGARI VOWEL SIGN E
    DEVANAGARI LETTER VA
    DEVANAGARI LETTER NA
    DEVANAGARI VOWEL SIGN AA
    DEVANAGARI LETTER GA
    DEVANAGARI LETTER RA
    DEVANAGARI VOWEL SIGN II
    DEVANAGARI LETTER DA
    DEVANAGARI VOWEL SIGN E
    DEVANAGARI LETTER VA
    DEVANAGARI LETTER NA
    DEVANAGARI VOWEL SIGN AA
    DEVANAGARI LETTER GA
    DEVANAGARI LETTER RA
    DEVANAGARI VOWEL SIGN II

Operating on codepoints, both versions cut after the second DEVANAGARI LETTER NA (न) breaking that grapheme cluster (it should be ना) and not displaying the final two clusters ग and री.

raiph · on Aug 27, 2015

> So apparently perl6 is also "wrong" and operates on codepoints

Yes and no. Yes, because the in-development Rakudo compiler is clearly currently giving the wrong result, and no because it operates on grapheme clusters (but has bugs).

(You can work with codepoints if you really want to but the normal string/character functions that use the normal string type, Str, work -- or more accurately are supposed to work -- on the assumption that "character" == grapheme cluster; afaik it's supposed to match the Unicode default Extended Grapheme Cluster specification.)

Fwiw I've filed a bug: https://rt.perl.org/Ticket/Display.html?id=125927

hahainternet · on Aug 27, 2015

Yeah you're right, a caveat in the docs says that current implementations aren't finished with this. I was under the impression the NFG work was done but I'll catch up with people on irc.

raiph · on Aug 27, 2015

> I expect that if you try out the Python version on your system you'll also get the "right" answer.

I don't think so. In my tests standard python (2.7 and 3.5) ignores grapheme clusters.

masklinn · on Aug 28, 2015

Python ignores grapheme cluster, that point was about my original test case using grapheme clusters I later found out had precomposed equivalent, so a transfer chain performing NFC would leave the test case with no combining characters (or multi-codepoint grapheme clusters) left in it.

raiph · on Aug 28, 2015

Gotchya.

flohofwoe · on Aug 27, 2015

Doesn't the C version have a serious bug? If the input string has 12 or more characters, the destination string will not be zero-terminated.

From the strncpy docs:

"No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow)."

ansible · on Aug 27, 2015

I'm usually sticking +1s to the storage for any strings for this purpose. So if I want to operate on MAXLEN number of characters, I'll allocation MAXLEN+1 for the character array.

And often times I'll be memset()'ing the destination to all NULLs when doing a string copy operation. I'm not real happy with string handling in C... as if that should be surprising to anyone.

Say, is there nice, small, suitable for embedded use string library anyone would care to recommend in C? I just want a nice string type that carries around its length and storage length, handles copies properly, and has the usual utilities. I suppose I could just write one...

rch · on Aug 27, 2015

You might look at the one from Redis:

https://github.com/antirez/sds

ansible · on Aug 27, 2015

That's interesting. Thanks for the link.

Ianvdl · on Aug 27, 2015

The author awards some arbitrary points to C even though his implementation of the solution is broken. His similarly poor Go implementation receives zero of these arbitrary points.

Why does this deserve the attention of everyone here? The author did not compare languages, he compared his aptitude with these languages, and considered broken implementations to somehow be comparable.

A more meaningful comparison would be to implement simple, efficient, working solutions to these problems and comparing them. This, as it stands, does not lead to any useful discussion.

BinaryIdiot · on Aug 27, 2015

I'm not sure what the takeaway is from this blog entry. Is it that Python 3 can do substrings easier than the other languages therefore we should use Python 3? That was what I thought it was, anyway.

Seems silly to pick a language based off this single, silly criteria otherwise why not JavaScript or probably other languages that can make the code even smaller?

console.log(mystring.substring(0, 12));

So it just seems arbitrary and weak in my opinion.

steeleduncan · on Aug 27, 2015

The entire scenario seems to have been constructed to highlight the runtime panic caused by out of bounds slices in Go. Either that or the well-known and well-discussed lack of generics.

_kst_ · on Aug 27, 2015

There are at least three major flaws in the 7-line C program, even ignoring character set issues. (main returns int, argv[1] can be null, and strncpy doesn't always null-terminate the target). If you're going to compare languages, you should find someone who knows each of them well.

Daishiman · on Aug 27, 2015

The Unicode situation in most languages is dismal.

Honestly though, the lack of generics for that Math.min function makes me happy I'm not programming in Go.

insertnickname · on Aug 27, 2015

    if a > b {
        // use a
    } else {
        // use b
    }

ridiculous_fish · on Aug 27, 2015

Oh dear. You had one job, min!

Veedrac · on Aug 27, 2015

That's actually the wrong way around.

insertnickname · on Aug 28, 2015

Yeah, I was thinking of max, not min, sorry. My point was that it's a trivial thing to write, your generic max (or min) is right there. The math.Max() (and math.Min()) function is not trivial, it handles certain special cases, and that's probably why it was included.[1]

[1] https://golang.org/pkg/math/#Max

BossHogg · on Aug 27, 2015

Article content aside, the slide out side menu that covers the scroll bar is incredibly annoying. Is that Blogger? Whatever it is needs to stop. Now.

ddevault · on Aug 27, 2015

The C code there fails if the unicode string includes characters whose width is greater than one octet.

zokier · on Aug 27, 2015

Which is noted right in the post:

> This treats things as byte-array instead of unicode, thus for unicode test it will end up printing just 車賈滑豈.

rakoo · on Aug 27, 2015

Which is useless then, because the output can't safely be considered a string anymore. I don't really see the point of writing the C "equivalent" and giving it any point when it doesn't even do the right thing.

masklinn · on Aug 27, 2015

None of the snippets comes even remotely close to doing the right thing so it doesn't really matter.

darkstalker · on Aug 27, 2015

Rust version:

    fn main()
    {
        if let Some(arg) = std::env::args().nth(1)
        {
            println!("{}", arg.chars().take(12).collect::<String>()); // chars() iteraters over codepoints
        }
    }

Veedrac · on Aug 27, 2015

Idiomatic Rust would probably avoid allocations, which means something more like

    fn main() {
        if let Some(arg) = std::env::args().nth(1) {
            println!("{}", {
                match arg.char_indices().nth(12) {
                    Some((idx, _)) => &arg[..idx],
                    None => &*arg
                }
            });
        }
    }

With the `unicode-segmentation` crate[1], you can just swap `char_indices()` with `grapheme_indices(true)`.

[1] https://crates.io/crates/unicode-segmentation

Skunkleton · on Aug 27, 2015

How is this on the front page of hacker news? What a shit post.

edofic · on Aug 27, 2015

A mandatory smart-ass Haskell response

    import System.Environment (getArgs)
    main = do
      [str] <- getArgs
      putStrLn $ take 12 str

nicolast · on Aug 27, 2015

Now with more operators!

    import System.Environment (getArgs)
    main = putStrLn =<< take 12 . head <$> getArgs

;-)

joeyh · on Aug 27, 2015

The actual smart-ass haskell response is simply "take 12". The spec didn't specify this needed to be a impure shell command, so a pure function is obviously better.

coldtea · on Aug 27, 2015

Well, for smart-ass (and I know you meant it as a joke) is not very impressive. Don't do anything more than the others, and the syntax is not so great either.

Veedrac · on Aug 27, 2015

On the contrary, his is the only one that crashes when more arguments than expected are passed. Hooray progress!

_pmf_ · on Aug 27, 2015

Of course, the C version could be just

    printf("(%.12s)\n", argv[1]);

pjmlp · on Aug 27, 2015

Assuming using 7 bit ASCII

_kst_ · on Aug 27, 2015

No, it merely assumes one byte per character. For example, it would work correctly in Latin-1 or EBCDIC.

In any case, the problem statement (though it's a bit vague) requires building a truncated string, not just printing it.

pjmlp · on Aug 27, 2015

It is enough to have mixed 8 byte code pages and then it is worthless.

kevin_thibedeau · on Aug 27, 2015

s/printf/sprintf/

jackielii · on Aug 27, 2015

why can't I downvote this!!! erhhhh

IshKebab · on Aug 27, 2015

Now try distributing your Python code as a single statically linked exe.

PyComfy · on Aug 27, 2015

http://nuitka.net/pages/overview.html

chapium · on Aug 27, 2015

Completely off topic, so if you are looking for discussion about the article skip this.

The low contrast ratio and bright colors on this blog are a bit hard to read. I normally switch to readability mode in safari when I encounter this, but the sites layout prevents this from working.

jofer · on Aug 27, 2015

The text is black on white... Am I missing something?

BinaryIdiot · on Aug 27, 2015

Hmm, are you referring to something very specific? The contrast ratio is incredibly high (black text on white background). The navigation bar has terrible contrast but that's all I saw.