Hacker News new | past | comments | ask | show | jobs | submit login
Why doesn’t Windows use 64-bit virtual address space below 0x00000000`7ffe0000? (microsoft.com)
206 points by signa11 on Dec 18, 2022 | hide | past | favorite | 62 comments



Tangent: on 64-bit Apple platforms, nothing is mapped in the lowest 4GiB of address space. This ensures any pointer truncated to 32 bits and dereferenced will segfault.


This was only 'enforced' by the linker on x86. You could handcraft a Mach-O executable to get around it. But it is now enforced enforced by OSX on ARM.

I don't see any good reason for this. My own half assed explanation is they're enforcing 32b cleanliness. It's not a very good explanation. But it's my explanation for something I don't like.


When migrating 32bit apps to 64bit platforms, it's common to follow the recipe:

1. Try to recompile on 64bit without any changes 2. Realize the program does not work because the types have different sizes now 3. Cast everything on sight to types that have same size as in 32bit. 4. Now the program "works" (as in, it compiled and maybe ran but you didn't test thoroughly)

This address space restrictions ensures that if you happen to cast a pointer in this way it will not work, forcing the developer to review what he's doing.


This argument is similar to my enforcing 32b cleanliness. Really, that was an issue back in the 90s. But this enforcement happened with the x86_64 to ARMv8 migration, 64b to 64b.

So I would like an explanation from Apple for why this now. What is their rationale?


The first iPhones were AArch32, and there was likely a sizable contingent of iOS software that was written to target the only existing bitness at the time :)

The same reasons GP offered still apply there. The amount of software directly ported from x86_64 to AArch64 was probably minimal.


> 2. Realize the program does not work because the types have different sizes now

Such an enormous amount of time in C/C++ programming is devoted to accounting for the size of an int, long, wchar_t, and long long being unreliable when porting.

This is why D has int fixed at 32 bits, long fixed at 64 bits. It's amazing how the problems just melt away.

Now, just remember to use size_t for everything used as an index, and you're good to go. (ptrdiff_t is almost never needed.)


Isn’t it similar in C with using uint16_t, uint32_t, etc?

IMO it helps to write programs from the ground up that you know will need to be cross compiled.


Yes, you could use int32_t. But who wants to type that? People default to convenience, and that's int.

Besides, the C Standard Library doesn't even use stdint.h. Now you've got unknown implicit integer conversions going on. It's just not a solution.


>3. Cast everything on sight to types that have same size as in 32bit.

Which is why on windows they defaulted to 32 bits for ints


Except for 16 bit Windows!


You didn’t need to handcraft it, the linker has a ‘pagezero_size’ argument which allows you to set it as small as one page and then the rest of the low 4GB can be used.


I remember trying this repeatedly with various incantations but always getting:

  segaddr blah conflicts with pagezero_size
Maybe there was an incantation I didn't find. But I did handcraft it and then Apple cancelled even that. Now we're gonna have to deal with a 4GB zero page. Period.


That's going to make certain old optimizations broken, I guess. But I hope to avoid arm64 macs (kinda campaigning to get linux machines at work) so it may remain only theoretical.


Maybe they're now relying on it for safety in core system libraries that get linked into your process? If you can be sure that the offset in *(ptr + offset) is always a 32-bit unsigned type, you get null pointer checking for free, right?


Can't one just remap the zero page?


No, you cannot. You can't remap it and you can't resize it. This is part of design assumptions baked into a couple of newish Apple technologies, ASLR and Address Signing.


I enjoy the short and sweet blog posts from Raymond Chen on "Why does Windows X?".


He is great at summing up these things in a way I can understand (most of the time, not being very experienced with Windows internals) without having the feeling he is dumbing it down, just that he is explaining it very clearly.


While this article doesn’t have it, I especially like his subtitles which are essentially a 1-line TL;DR of the content. As a recent example:

Title: Instead of a C++ template parlor trick, why not just add support based on whether the header file has already been included?

Subtitle: Header file inclusion order dependencies


Has anyone here ever used Windows NT on a DEC Alpha?

A company in Ireland I worked for gave a mobile phone GIS demo to the Irish national phone company and Digital lent us a beefy server with two AXP CPUs (it was gonna be the future!) running windows NT. We literally pulled it on cobbled dublin streets on a a handcart to bring it to their offices.


I had a Customer in 2000 using SQL Server 7.0 on Windows NT 4.0 on an Alphaserver 1200. The performance, as compared to contemporary x86 machines, was very good. Was it worth the cost? I don’t know. I never did a dollars to performance benchmark. I suspect it wasn’t really worth the DEC tax.


In the mid-90's, I worked at a place that had one used for our intranet server running Win NT with IIS.

Mostly used a DEC Alpha as a "visible" competitor to our major server provider at the time (Compaq) to try and manage the prices they were charging (at least until Compaq bought Digital, and then eventually bought by/merged with HP).

One interesting thing was that we had a faulty RAM chip that we only discovered due to a bug in website publishing to IIS causing a big memory leak which crashed the server. Once the the chip was replaced, we still had the underlying leak - but at least the server didn't crash any more.


Me, as a QA in Microsoft. I assume other groups in Microsoft at the time were running it too. I wrote a bit about that here:

https://news.ycombinator.com/item?id=27639461


I had one for running very early builds of Win64 at Microsoft. It was a very aesthetically pleasing piece of hardware. The builds of Windows I ran on it were truly at the stage of "hey it built" so I remember getting a dialog one time that said something like "Invalid Error Code" with half the warning icon rendered and the other half random garbage. We moved on to IA64 and AMD64 shortly afterwards.


Yes – back in the 90s, my first tech job was doing QA for a COBOL vendor and we had that on our support matrix. The Alpha processors were really fast but did you ever know there was A VERY SERIOUS COMPUTER in that room with the industrial fans and a case like a filing cabinet which probably weighed 50 pounds empty. We even for a time had an Alpha on Linux build for a single customer.


I recall my father mentioning having an Alpha with Windows NT4 available at work, circa ~1998. This was probably explicitly to run computational packages like ANSYS. It wasn't main work machine, which AFAIK was something more like random Pentium, maybe Pentium Pro (doubtful - I think there was a dual Pentium Pro which was similarly a shared machine but not daily driver)


Only once for around a week, we purchased a alphaserver and it didn't come with the correct OSF/1 installation media, so one of the guys installed NT4 just to see what its like - we were all impressed as it ran multiple copies of Doom without issue.


In 1998-99, I worked at a company where we had an Alpha running Windows NT Server. The company was a large DEC customer and we had tons of Alpha servers sitting around.


I worked briefly on a real-estate web system which ran on it. It wasn't bad, as these things go (I was coming off OSF/1)


Hand delivering servers and other gear is always a part of a good story. Makes me wish for a decent cart and maybe some brakes.


Back around 2001, I bought a secondhand Alpha ATX mobo and for a lark installed NT 4 on it. I didn’t do much with it though.


What was the overlap of both products being commercially available? I didn't think it was too long.


Wikipedia says it was only commercially available for NT4, I do somewhere have the Dec Alpha Windows 2000 beta discs (It got killed very late in the beta).


A friend of mine had an Alpha, and I was getting the Windows betas (I got 98se, 2000, and XP, and then nothing else). My friend was really happy to 'borrow' the win2k for alpha discs. IIRC, release candidates 1 and 2 had alpha discs, and then 3 and beyond didn't; but I might be off by one. I assume it would have shipped as part of the release if only Alpha was cancelled a couple months later.


IIRC every RC disk had alpha binaries, only the RTM and later doesn't - because Alpha getting canned was declared in the short window between last RC and RTM, in a rather abrupt manner to everyone involved on the project.


https://www.theregister.com/2000/12/27/win2k_for_alpha_alive...

Seems to indicate that the Release Candidate 3 builds for Alpha never made it out through official channels. I'm pretty sure beta and RC was one disc per flavor and architecture though: so x86 professional, x86 server, etc, and alpha pro/server/etc. A big bundle of discs would come from Redmond.

I know nt4 had all the archs on the release disc, but I don't think that was the plan for win2k


> That is a special page of memory that is mapped read-only into user mode from kernel mode, and it contains things like the current time, so that applications can get this information quickly without having to take a kernel transition. This page is at a fixed location for performance reasons.

I didn't know there was a vdso-like mechanism in Windows as in Linux.


Yeah, it's called KUSER_SHARED_DATA, and contains a lot of interesting items.

https://learn.microsoft.com/en-us/windows-hardware/drivers/d...


> No. The virtual address space starts at the 64KB boundary. You can confirm this by calling GetSystemInfo and checking the lpMinimum­Application­Address. It will be 0x00000000`00010000.

Aw, man. I recently found out there is also a similar restriction on Linux. Bummer, because I have a good use for putting code into the lower 64KB of memory (not 4KB, still want to use that for catching NPEs). My use case is a fast interpreter for bytecode. Since every bytecode handler needs an entry in the dispatch table, and I have 7--yes 7!--dispatch tables, I'd like to make their entries 2 bytes. So instead of taking 1KB each, they would take only 512B each. But alas.

Another great use case is Wasm. Since the memory is a sandboxed 4GB range and only indexed into by 32-bit offsets, placing it at virtual address 0 would save an add instruction (or use of segment register) on every access.


> Since every bytecode handler needs an entry in the dispatch table, and I have 7--yes 7!--dispatch tables, I'd like to make their entries 2 bytes. So instead of taking 1KB each, they would take only 512B each. But alas.

Hmm. If the upper half of the handler addresses are the same, then why not store just the lower 2-bytes of each handler address in the dispatch table. If you're on x86, there's no add needed if the upper bytes of the register you're loading into already contain the upper half of the address, for example:

   mov eax, HANDLER_BASE
   mov ax, [edi + ebx]


Interesting idea. I think it means using another register to permanently hold the HANDLER_BASE though? The dispatch sequence is only 4 instructions. I tried it as a delta encoding, requiring another add instruction on every dispatch, but that is an approx. 5% penalty. So I was hoping that putting it at absolute address < 64KB I could do the sequence without another add.


On linux it depends on a sysctl parameter, and on 4kB platforms some runtimes used the fact that it was set to "one page" as a method to have very short pointers starting at address 4097 for things very often references (and which might need guard page, like NULL value).

Made for some borked code with Clozure Common Lisp when people started running 64kB pagesize on PPC64, and that meant that even with minimal zero page it still was too big to map the "low memory" where CCL stuck it's NULL definition.


What does the ` mean?


That’s the convention on windows for denoting 64-bit addresses, separating the two 32bit parts with the backtick. Don’t recall why now.


I assume it just makes the address easier to read. I imagine most programs don’t deal with address spaces larger than 4GB, so being able to quickly identify the lower bytes of the address when physically looking at a memory address is probably quite handy.


C++14 allows you to add apostrophes (') to numeric literals anywhere you want (presumably you do so at regular and rational intervals, but you don't have to).


They need to reserve the first 2147352576 bytes for the Color Graphics Adapter display buffer.


Has anyone tried loading COMMAND.COM in there? Surely CGA isn't using all that space.


tl;dr several reasons, the most specific of which is "to keep linker relocs simple"


…on a long dead architecture.


Chen is the poster boy for backwards compat.

They used to work very hard to make sure updates didn't break popular programs that used undocumented hacks.


> They used to work very hard to make sure updates didn't break popular programs that used undocumented hacks.

Obsessively.

I worked for a company that sold a security auditing tool. We cared very much about which version of Windows we were running on. When Microsoft came out with the next version of Windows and to our dismay, we found out that our software recognized it as the old version.

Turns out that Microsoft had found (pre release) that the new version broke our software. It reported "unknown version" or something. So they added us to a long list of applications that, when those applications asked for what version of Windows they were running on, Windows lied and told them an earlier version.

It cost us some heartburn to work around Windows trying to be helpful to us...


I frequently implemented this from the other side at Apple.

You'd be shocked at how much software enables features based on a `==` OS version check. When the OS bumps its version, the software regresses to some earlier behavior. Lying to the app is the simplest way to keep it working.


I would argue they still are; Windows is dead in the water if it doesn't run the vast majority of Win32 software out there today.


I still have Windows 2000 installed on a Digital Ultimate Workstation (2x Alpha 21164). I turn it on every few years when I need a nostalgia kick ;-)


So sad to see Alpha die off, especially since I haven't noticed any architectures with anything like PALCode spring up since. The Alpha's firmware was essentially a hypervisor that only supported a single guest, and the OS kernel had to upcall to the firmware for any privileged operations.

In particular, it would be nice to have userspace programs be able to take advantage of new/larger registers without requiring the OS kernel to support the extra CPU state. If the firmware (which presumably is available as soon as the new CPU ships) handles the context switch, then the OS kernel doesn't need an update.


On the other hand, Alpha had an infamously weak memory ordering, and it dying means that we don't have to deal with that when writing software. Check Documentation/memory-barriers.txt on the Linux kernel for the gory details (recent versions are after the kernel was changed to make Alpha work somewhat more similar to other architectures; check that documentation file on older kernel versions for the full horror).


Yes, I'm familiar with the woes of the particularly weak memory model. It's mostly a problem when writing lockfree data structures, as mutex acquisition and release executed the necessary fence instructions to maintain proper happens-before relations.


PAL code didn't do anything like context switches and I'm not sure how it could, given how the OS needs to know the specifics of the thread context in many scenarios.


That's correct. On NT/Alpha context switching was done in ntos\ke\alpha\ctxsw.s, which was regular kernel mode assembler code. A PAL call was made but it simply set the new Teb and Pdr; two or three lines of code. I've never seen the PALcode for VMS or OSF/Tru64 but the NT PAL for swpctx was designed so that alpha assembler code in NT looked as much like the mips code that davec wrote. It made the port easier that way.

Interesting Raymond continues to mention NT/Alpha now and then. If you run into him mention I have a collection of NT/Alpha artifacts and anecdotes he might be interested in.


I believe in the specific case of L4/Alpha, the PALCode did perform the context switches. Granted, I doubt it was as robust as the VMS or Tru64 PALCode versions that were production-ready.


PALcode implemented context switches for all systems, including Linux (which uses Digital Unix PALcode). Specifically, PALcode covered various internal details of the CPU presenting unified interface to OS - each CPU needed custom PALcode implementation, though IIRC there were only few mandatory PALcode calls (related to memory barriers, iirc). The kernel covered switching OS-specific structure pointers over, PALcode handled all sorts of MMU, TLB, PASID indexing of TLB, internal CPU state etc.

L4/Alpha PALcode was limited to specific set of CPUs it was implemented for, so unlike VMS, NT or Digital Unix PALcode it wasn't available outside of those few machines.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: