Hacker Newsnew | past | comments | ask | show | jobs | submit | mariarmestre's commentslogin

Hmmm... this is not eligible for their zero data retention policy anymore. Not sure how this will go down.


They added a clause that the schemas themselves aren't covered by ZDR. Their policy for the prompts appears unchanged.

https://platform.openai.com/docs/models/default-usage-polici...


Are you going to keep maintaining the package?


we intend to


I just tried it on a AI newsletter: https://a.tldrnewsletter.com/web-version?ep=1&lc=f08d3180-da... and it turns out I'm not keeping up with AI news as well as I thought.



Our software changes submitted links to canonical URLs when it finds them, and that page has the canonical URL https://community.intel.com/t5/Blogs/Tech-Innovation/Cloud/b....

I've fixed it above now.


Thank you! Our goal is to make production-ready code, so we believe that good documentation and stability of the code are paramount. I'll pass the feedback along the rest of the team :)


This represents a major rewrite of the package with more powerful features than ever.

We have introduced the new concept of components. Components are composable and customisable and can be connected into pipelines. Pipelines are dynamic execution graphs that support a range of flows from simple linear chains to more complex execution flows containing loops. This means, you can get started easily with a few lines of code, but have room for extending and customising the logic of the pipeline.

This restructuring of the package paves the way to building truly extensible and composable AI systems ready for production.

The team will be around for questions!


Thank you for this! Congrats for the beta release. I guess this is not totally battle-tested yet then?


No, but it's a real product.


I think a fundamental issue with search, and the reason why many companies do not invest in tuning a good search experience, is that the main metric usually is to minimise embarrassing/irrelevant results, rather than get the best possible set of results. How can you even know what is the best answer to your query? Systematic evaluation is very hard.


If you control the browser your results are in you can monitor clicks and time spent on document to generate pretty good signal. If someone opens a document and looks at it for fifteen minutes you should be fairly convinced it was useful.


The only issue here is what I mentioned in the comment above: how easy is it to read and parse the content of said website and is it legal to read the content programmatically? Do you have any website(s) in mind?


Thanks so much for your comment! You're right that this annotation tool can be used on any form of free-form documents found online. I tackled Wikipedia first because it was an obvious first choice and they have an API to read the html. This could be opened to other sources of data, but I also do not want this to become a scraping tool, so we would need to weigh costs/benefits of adding new data sources. The additional cost of adding a new source is mostly about how difficult it is to read and parse the content. In the future, I could integrate with some paying sources (e.g. news publications), where people have to pay for the content they scrape & label.

I have a pitch deck and I'm looking for all the things you mentioned :-). I can send the pitch deck to anyone interested.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: