Hacker News new | past | comments | ask | show | jobs | submit | henrydark's comments login

The innovation is to find traces of a global cosmic-ray event with which to connect the dating of objects in one local area, Greece, where the dendrochronological data is not continuous, with those in far away local areas, for example England/Ireland, where we have continuous dendrochronological data


I get the sentiment, but personally I can easily imagine myself writing an autocompleter that would work fine with select before from. (I don't write much sql so I don't)

Just to clarify, my point is that when we do write sql most of us start by writing the from part, and even if we didn't I can just offer all columns from all tables I know about with some heuristic for their order when autocompleting in the select part.


I must go on compiling / You can't break that which isn't yours / I must go on packaging / I'm not my own, it's not my choice


This is lovely, I didn't know. I guess this is what Kuhn was talking about, we write history in retrospective, sorting it out preferring narrative over fact.


Lots of people are saying that having large files in a repo is wrong, bad, bad design, incorrect usage.

Forget that you know git, github, git-lfs, even software engineering for a moment. All you know is that you're developing a general project on a computer, you are using files, and you want version history on everything. What's wrong with that?

The major issue with big files is resources: storage, and network bandwidth. But for both of these it is the sum of all object sizes in a repo that matters, not any particular file, so it's weird to be harking on big files being bad design or evil.


I did just over a decade in chip design. Versioning large files in that domain is commonplace and quite sane. It can take wallclock days of processing to produce a layout file that's 100's of MBs. Keeping that asset in your SCC system along side all the block assets it was built up out of is very desireable.

Perforce handled it all like a champ.

People who think large files don't belong in SCC are...wrong.


That's why Perforce is still the SCM of choice for a lot of creatives.

I don't know if they still do it, but Unreal used to ship a Perforce license with their SDK.


That's also why perforce is slow as heck unless you throw massive resources at it. I also work in the chip industry BTW.


I occasionally used to start a sync, go get coffee, chat with colleagues, read and answer my morning email, browse the arxiv, and then wait a few more minutes before I could touch the repo. In retrospect, I should have setup a cron job for it all, but it wasn’t always that slow and I liked the coffee routine. We switched to git. Git is just fast. Even cloning huge repos is barely enough time for grabbing a coffee from down the hall.


I mean "massive resources" is just de rigeur across the chip industry now. The hard in hardware is really no longer about it being a physical product in the end.


I've only used Perforce for two years and it didn't feel slow at all. The company wasn't exactly throwing money at hardware.


I don't like it (but used it for many years).

I love Git, but, then, I don't have a workflow that would benefit from Perforce.


> Lots of people are saying that having large files in a repo is wrong, bad, bad design, incorrect usage.

I don't think that is true. You do see people warn that having large files in Git repositories, or any repository that wasn't designed with support for large files in mind, is "wrong", in the sense that there are drawbacks for using a system that was not designed to handle them.

Here's a historical doc of Linus Torvalds commenting Git's support for large files (or lack thereof)

https://marc.info/?l=git&m=124121401124923&w=2


> Forget that you know git, github, git-lfs, even software engineering for a moment. All you know is that you're developing a general project on a computer, you are using files, and you want version history on everything. What's wrong with that?

THANK YOU. Fucking prescriptivists ruin everything.


How is it not bad design? Let's say you are working in a team. Would you really want your colleagues spending a significant amount of time cloning your artifacts? Your comment is also not consistent with forgetting that one is not a developer. Even if it's my grandma, she's not gonna want to wait for 1hour to download a giant file from VC assuming she knows what a VC is. Large blobs can go into versioned object storage like GCS or S3 etc


In Subversion at least, you'd do a partial checkout. If you don't need a particular directory you just don't check it out. If you lay out your repo structure well there's no problem. It was incredibly convenient.

I've tried many different SCM over the years and I was happy when git took root, but its poor handling of large files was problematic from the beginning. Git being bad at large files turned into this best practice of not storing large files in git, which was shortened to "don't store large files in SCM." I think that's a huge source of our availability and/or supply chain headache.

I have projects from 20 years ago that I can build because all of the dependencies (minus the compiler -- I'm counting on it being backwards compatible) are stored right in the source code. Meanwhile, I can't do that with Ruby projects from several years ago because gems have been removed. I've seen deployments come to a halt because no startup runs its own package server mirror and those servers go offline or a package may get deleted mid-deploy. The infamous leftpad incident broke a good chunk of the web and that wouldn't have happened if that package was fetched once and then added to an appropriate SCM. Every time we fetch the same package repeatedly from a package server we're counting on it having not changed because no one does any sort of verification any longer.


SCC systems that handle big files don't suffer from the "you have to clone all the history and the entire repo all the time" problem that git suffers from. At least Perfoce doesn't...

git has its place but it's really broken the world for how to think about SCC. There are other ways to approach it that aren't the ways git approaches it.


When you make a video game you want version control for your graphics assets, audio, compiled binaries of various libraries, etc. You might even want to check in compiler binaries and other things you need to get a reproducible build. Being able to chuck everything in source control is actually good. And being able to partially check out repositories is also good. There is no good technical reason why you shouldn't be able to put a TB of data under version control, and there are many reasons why having that option is great.


The versioned object storage solves nothing. If your colleagues need the files, they're going to have to get them, and it's going to be no quicker getting them from somewhere else. Putting them outside the VCS won't help. (For generated files, you may have options, and the tradeoffs of putting them in the VCS could be not worth it. But for hand-edited files, you're stuck.)

If the files are particularly large, they can be excluded from the clone, depending on discipline and/or department. There are various options here. Most projects I've worked on recently have per-discipline streams, but in the past a custom workspace mapping was common.


> Would you really want your colleagues spending a significant amount of time cloning your artifacts?

Not just the artifacts, but their entire history. That is a problem that Git has out of the box, but there is no reason it needs to work that way by default. LFS should be a first class citizen of a VCS, not an afterthought.


So how would you version a game that needs assets? These files must be versioned but can be very big, for example long cutscene videos.

Some projects need the ability to version big files, there is a good reason why perforce exists and is widely used in the gaming industry.


I am not saying that it is a better UX, but hashed/versioned blobs on S3 would mostly work depending on tooling integration.


That's building a custom version control on top of the version control you're already using.


not really, it is like building a custom storage layer for your VCS.

you are still relying only on git as a source of truth for which artefacts belong to which version.


Isn’t that essentially what git lfs is?


I believe so, but with different UX. In almost every case I expect git lfs to be better, but I can see reasons to use more custom flows.


Git is designed with a strong emphasis on text source and patches. It simply isn't designed for projects with large assets like 3D animation, game dev, etc. Having said that, solutions like LFS, Annex and DVC (not git-specific) work really well (IMO). If you don't like that, there are solutions like Restic that can version large files reasonably well (though it's a backup program).


This is an example of a more generic problem. We adopt some principle or practice for rational reasons, and then as a mental shortcut conflate it with taste, aesthetics, cleanliness. But no software or data is 'dirty' or 'ugly', we feel it so because of mental associations, but intuition is unreliable -the original reasons may not apply, or may be less important.


Well, they're not _just_ that, right?

First, they can be differential forms, not only functions. Second, there's an important note that we don't look only at things over C. For example, specifically in the context of Fermat's Last Theorem, we need Hida's theory of p-adic families of modular forms. Much of the arithmetic of modular forms comes from the modular curves being algebraic and (almost) defined over the integers.


The above definition (analytic function on a moduli space of elliptic curve) actually extends in a natural way. I haven’t known what modular forms were before the parent comment, but I know algebraic geometry, and so it is natural for me to extend above definition for cases you mention.

If modular forms are (global?) sections of the structural sheaf of the moduli space of elliptic curves, the differential forms view will just be the standard construction of sheaf of 1-differentials. Similarly, since elliptic curves are easily defined over arithmetic fields, arithmetic modular forms will just be same thing, but over C_p or something like that.

I actually might be totally off in the above, but I doubt I am: that’s the power of Grothendieck approach, where everything just falls into its natural place in the framework.


This definitely fits with Grothendieck's philosophy: he basically ignored all work in this area, implicitly claiming it was trivial, while some of his closest friends and most famous student made huge strides with actual hard work - not quite things falling into place. In fact, the paper most famously proving the Weil conjectures has as an explicit target the coefficients of a modular form, uses an inspiration from automorphic forms theory, and is infamously Grothendieck's greatest disappointment.

There is rich structure in this area of maths that goes well beyond just sections of some sheaf, or at least this is what Serre, Deligne, Langlands, Mazur, Katz, Hida, Taylor, Wiles and many others seem to think.


Oh, I did not meant to imply that the framework necessarily makes it so that the results open like a softened, rubbed nut, as Grothendieck said; I don't quite agree with that. For me, the benefit is rather in building a mental framework, which facilitates understanding, and putting seemingly disparate things into one coherent whole. The actual hard thinking and insights are still necessary, it ain't no royal road.


Still waiting to see the first new irrationality proof with these ideas, wishing you lots of good luck!


Actually, AFAIU the TLA+ proof is only for a few small cluster sizes - not for all sizes. And the number of nodes in the painting is definitely above that checked by TLA+...


I didn't know about task spooler. Is it better than using xargs with a parallel pool?

    xargs -L1 -P20 git clone --bare < repositories.txt


Yes, for me it is better because if you do it your way you have to keep your ssh connection open until all of the git clones have been done, which in this case takes several hours.

(Or you could also run your way in tmux or screen.)

With task-spooler, it puts all of the commands (in this case, the individual git clone commands for each of the repos) in a queue and it runs the commands independently of my ssh session, so I can quickly add a bunch of jobs like this to the queue and immediately disconnect my ssh session.


I have a hot take on this, which I hope will resonate with at least a few people: duplication, even of blocks of up to a few long statements, rarely bothers me, because I remember all the duplications as a single instance. I have extra ordinary memory, and this makes a huge difference in how I think of and write code. Or anything really. I save everything I've ever written, like bash history, but everything, and refer beck to it and copy paste somewhere else. I wonder if anyone else has this. This doesn't affect how I think of production code, but it hugely affects my work flow.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: