Hacker News new | past | comments | ask | show | jobs | submit | mdaniel's comments login

And since the other two links are to GH: https://github.com/Skyvern-AI/skyvern (AGPLv3)

https://github.com/reworkd/tarsier/pull/115/files represents someone who does not know what git is used for

  Cloning into 'tarsier'...
  remote: Enumerating objects: 15238, done.
  remote: Counting objects: 100% (1613/1613), done.
  remote: Compressing objects: 100% (929/929), done.
  Receiving objects: 100% (15238/15238), 3.01 GiB | 14.82 MiB/s, done.

> and efficient

citation needed

https://xmpp.org/rfcs/rfc6121.html#session


Almost all steps in this example session are optional. But even clients that do everything are still more efficient than most other chat apps people use: https://blog.lewman.com/internet-messaging-versus-congested-...

> Comments arent signed but change meaning of document

Do you have an example of that assertion handy? The only comment-influences-execution behavior I'm aware of is in SQL[1], and I haven't ever seen any XML system (in any business domain) which does what you said

1: I mean, setting aside linter suppression, which pedantically does impact execution but I meant of the final software


https://duo.com/blog/duo-finds-saml-vulnerabilities-affectin... has the full details.

But basically in some xml apis, a comment can split a single text node into two adjacent text nodes. Some implementations would only look at the first text node. The original xsignature spec (although i think this has been changed) said to remove all comments from doc before signing it, so the attacker can add arbitrary comments without messing up the signature.


Being a toml-n00b, why is this quoted? https://github.com/CerebriumAI/examples/blob/85815f8e09e9e77...

Related to that, it seems the syntax isn't documented https://docs.cerebrium.ai/cerebrium/environments/config-file...


Do you mean why the individual file names aren't quoted?

You can see an example config file at the bottom of that link you attached - agreed we should probably make it more obvious


heh, I don't need an example in the docs, the whole repo is filled with examples, but unless you expect some poor soul to do $(grep -r ^include . | sort | uniq) and guess from there, what I'm saying in that the examples -- including the bare bones one in your documentation -- do not SPECIFY what the glob syntax is. The good thing about standards is that there are so many to choose from, so: python's os.glob, golang's glob, I'm sure rust-lang has one, bash, ... I'm sure I could keep going

As for the quoting part, it's mysterious to me why a structured file would use a quoted string for what is obviously an interior structure. Imagine if you opened a file and saw

  fred = "{alpha: ['beta', 'charlie''s dog', 'delta']}"
wouldn't you strongly suspect that there was some interior syntax going on there?

Versus the sane encoding of:

  fred:
    alpha:
    - beta
    - charlie's dog
    - delta
in a normal markup language, no "inner/outer quoting" nonsense required

But I did preface it with my toml n00b-ness and I know that the toml folks believe they can do no wrong, so maybe that's on purpose, I dunno


> Doesn't seem like a good idea for extensions to be accessing local resources though.

To the best of my knowledge all localhost connections are exempt from CORS and that's in fact how the 1Password extension communicates with the desktop app. I'd bet Bitwarden and KeePassXC behave similarly


That it goes in the opposite direction of your cited project (run modern-ish python from within the JVM), and almost certainly has a much, much better JIT story than yours

> Finic uses Playwright to interact with DOM elements, and recommends BeautifulSoup for HTML parsing.

I have never, ever understood anyone who goes to the trouble of booting up a browser, and then uses a python library to do static HTML parsing

Anyway, I was surfing around the repo trying to find what, exactly "Safely store and access credentials using Finic’s built-in secret manager" means


We're in the middle of putting this together right now but it's going to be a wrapper around Google Secret Manager for those that don't want to set up a secrets manager themselves.

Often times websites won't load the HTML without executing the JavaScript. or uses JavaScript running client side to generate the entire page.

I feel that we are in agreement for the cases where one would use Playwright, and for damn sure would not involve BS4 for anything in that case

What would you recommend for parsing instead?

In this specific scenario, where the project is using *automated Chrome* to even bother with the connection, redirects, and bazillions of other "browser-y" things to arrive at HTML to be parsed, the very idea that one would `soup = BeautifulSoup(playright.content())` is crazypants to me

I am open to the fact that html5lib strives to parse correctly, and good for them, but that would be the case where one wished to use python for parsing to avoid the pitfalls of dragging a native binary around with you


I think there's some misunderstanding? Sometimes parsing HTML is the best way to get what you need, however there are many situations where one must use something like playwright to get the HTML in the first place (for example, the html is generated clientside by javascript). What's the better alternative?

Yes, there is for sure some misunderstanding. Of course parsing HTML is the best way to get what you need in a thread about screen scraping using browser automation. And if the target site is the modern bloatware of <html><body><script src=/17gigabytes.js></script></body></html> then for sure one needs a browser (or equivalent) to solve that problem

What I'm saying is that doing the equivalent of

  chrome.exe --dump-html https://example.com/lol \
    | python -c "import bs4; print('reevaluate life choices that led you here')"
is just facepalm stupid. The first step by definition has already parsed all the html (and associated resources) into a very well formed data structure and then makes available THREE selector languages (DOM, CSS, XPath) to reach into that data structure and pull out the things which interest you. BS4 and its silly python friends implement only a small fraction of those selector languages, poorly. So it's fine if a hammer is all you have, but to launch Chrome and then revert to bs4 is just "what problem are you solving here, friend?"

In python specifically I like lxml (pretty sure that's what BS uses under the hood?), parse5 if you're using node is usually my go to. Ideally though you shouldn't really have to parse anything (or not much at all) when doing browser automation as you have access to the DOM which gives you an interface that accepts query selectors directly (you don't even need the Runtime domain for most of your needs).

> pretty sure that's what BS uses under the hood?

it's an option[1], and my strong advice is to not use lxml for html since html5lib[2] has the explicitly stated goal of being WHATWG compliant: https://github.com/html5lib/html5lib-python#html5lib

1: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#insta...

2: https://pypi.org/project/html5lib/


That's good to know, will try it out. I haven't had many cases of "broken" html in projects where I use lxml but when they do happen it can definitely be a pain.

Almost any process that involves the word "workflow" (my mental model is one where the user would press alt-tab to look up something else in another window). The very, very common case would be one where they have a stupid SMS-based or "click email link" login flow: one would not wish to do that a ton, versus just leaving the session authenticated for reuse later in the day

Also, if my mental model is correct, the more browsing and mouse-movement telemetry those cloudflare/akamai/etc gizmos encounter, the more likely they are to think the browser is for real, versus encountering a "fresh" one is almost certainly red-alert. Not a panacea, for sure, but I'd guess every little bit helps


The way we plan to handle authenticated sessions is through a secret management service with the ability to ping an endpoint to check if the session is still valid, and if not, run a separate automation that re-authenticates and updates the secret manager with the new token. In that case, it wouldn't need to be stateful, but I can certainly see a case for statefulness being useful as workflows get even more complex.

As for device telemetry, my experience has been that most companies don't rely too much on it. Any heuristic used to identify bots is likely to have a high false positive rate and include many legitimate users, who then complain about it. Captchas are much more common and effective, though if you've seen some of the newer puzzles that vendors like Arkose Labs offers, it's a tossup whether the median human intelligence can even solve it.


I have to be very disciplined about switching my layout back to US English before launching games because they seem insistent on using the mapping for WASD versus the keys for WASD. But, worse (IMHO): some games use the keys and thus I can't predict whether the game was coded correctly or not and thus I just gave up and made it part of my launch process

I have secret desires to convert one of my old MacBook Pro devices into a Steam-via-Proton setup to get out from under the tyranny of Windows but since gaming is designed to be a break from "work" ... that's why it's not already done


Your computer doesn't actually know what the keys for WASD are. It just receives a number (commonly called scan code) for the pressed key from the keyboard, and has to use the mapping to determine which key that actually was.

There's some convention for which scan code to use for which physical position on a keyboard, but that's not correlated with what's actually printed on the key caps. E.g. on common QWERTY keyboards the "A" key will have scan code 4, but on AZERTY keyboards it will have scan code 20.

Games can probably get away by listening for the scan codes of the keys in the position commonly used by WASD, but it's a bit fragile, and they can't actually tell you what's printed on the keys they're listening to. The lack of consistenency is certainly annoying, though...


> Games can probably get away by listening for the scan codes of the keys in the position commonly used by WASD, but it's a bit fragile, and they can't actually tell you what's printed on the keys they're listening to. The lack of consistenency is certainly annoying, though...

The operating system knows how to map scan codes to characters based on the keyboard mapping the user has selected.

    Win32: MapVirtualKeyExA / MapVirtualKeyW
    MacOS: CGEventKeyboardGetUnicodeString / UCKeyTranslate
    Linux: xkb_state_key_get_utf8
Hell, glfw nicely wraps all of these up with glfwGetKeyName.

Stop reading character codes directly, use the scan code and when displaying them map it using the system API so people see the right freaking binding. It's not rocket science,


Scancode vs character code is what they are describing:

Games SHOULD use scancodes for positional mappings (like WASD) and rely on the system keymap to decide what letter to display them as. There is no "probably" here, the scancode is the scancode and the keymap is the keymap.

Games OFTEN use the character codes directly regardless if it's for a positional mapping or not. This requires explicit support in the game for each possible system keymap, otherwise you end up with nonsense mappings.


We've invested some time to improve this situation for the games we port, at least to some extent. Keyboard input is read as a combination of scan code + win32 virtual key information to determine the physical key pressed on your keyboard. This way the keybindings are (almost) independent of the keyboard layout.

However, we also reflect your current key bindings with dynamic button prompts that tell you which key to press in game. For this we translate the determined physical key back through the selected keyboard layout to figure out what the corresponding label on your keyboard might be.

Most of this is just win32 API shenanigans, but the following post provides a bit more detail on reading inputs this way.

https://blog.molecular-matters.com/2011/09/05/properly-handl...


Not sure about Macbook Pro, but I converted my Windows 11 gaming desktop to Linux and Steam with Proton works just fine for all the games I care about. The only game that didn't play was Starfield and that was fixed shortly after a few weeks.

This is the kind of slippery problem I would never think about but would make me want to break my keyboard if I encountered in real life.

I am very lucky that Win+spacebar switches, so it's low drama to execute, just some drama to remember. Insult to injury, and likely very related to the article's points: if I remember after launching the game, Win+spacebar is 50/50 on whether the game notices and thus I usually just pay the startup cost again if I forget rather than having things be in limbo

> I am very lucky that Win+spacebar switches, so it's low drama to execute, just some drama to remember.

Unless they've fixed this in the last few years, switching your input mode like that while World of Warcraft is running will cause some kind of crash that prevents you from using the in-game chat. This felt especially egregious because switching input modes is the kind of thing you want to do all the time if you're using the in-game chat.


There's a lot to be said for a consistent process that always fixes a problem, vs one that fixes the problem relatively permanently, inconsistently, and requires more memory.

That's easier than just changing the key bindings in game?

Option 1: remember to press Win+spacebar before launching game

Option 2: launch game, navigate to settings, write down on a sheet of paper what the current keybindings are (since one I press "." for "e" it's going to either whine or blank out whatever "." was mapped to when I started), repeat that exercise about 25 times per game, times the 15 games I have in rotation right now, feel that was a great use of my "downtime"

Option 2 has the added bonus of making it 50/50 whether the game help text knows I remapped, and thus says "press . to open door" or whether it continues to say "press e to open door" and I have to guess whether it means "their e" or "my e"


The way games ought to do keymapping is to just allow conflicts. If you map two functions to the same key, that key should do both things, until the player sorts it out. The keymapper can put a warning sign up, "conflicts with foobar", but it shouldn't remove the key from foobar, and it shouldn't say "I can't allow you to do that", either.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: