Hacker News new | past | comments | ask | show | jobs | submit login
MarketBrief (YC S11) Makes Obtuse SEC Documents Human-Friendly (techcrunch.com)
177 points by lyime on Aug 16, 2011 | hide | past | favorite | 54 comments



This is one of those opportunities that seems blindingly obvious in hindsight - the amount of data being pushed out daily is way beyond what any of us can process, especially in the typical SEC language. Turning it into meaningful (or at least, understandable) info changes the dynamic entirely - I can skim a document now and know if it's worth finishing. And being able to pump high-value, legible, auto-generated financial news across to other networks is an opportunity that boggles the mind.

The mention of other sources (FDA, etc.) also makes a ton of sense I hadn't thought of. I wonder if we'll see the day of mostly-generated news with humans mainly being editors?

One question: the included screenshot has the sentence, "I've included a few more notes from..." - is that auto-generated, or is that an editor's note?


I have been using SECWatch for awhile, and noticed that the email notifications I've been getting have been pointing to MarketBrief for the last few weeks. Didn't realize it was a YC company until now.

This is a product I have a personal need for, but I have to say that I'm disappointed with MarketBrief so far. The auto-generated "human readable" articles don't seem to provide much value. In fact, when I started to see them, I assumed it was some kind of SEO strategy, not something that was actually intended to be useful to me. I mean, why would I want to see a bunch of numbers embedded in long-form prose template instead of a more easily digestible table?

Even more seriously, a lot of the SEC filings on MarketBrief don't seem complete. For example, I received an email notification today about this SEC filing:

http://marketbrief.com/pmic/8k/events-or-changes-between-qua...

The filing basically says that the company issued a press release, which is supposed to be attached. But the attached press release is nowhere to be found. If I go to the main SEC EDGAR website, the press release is easily accessible:

http://www.sec.gov/Archives/edgar/data/1453820/0000950123110...

I'm excited about the problem being tackled here, so I'll look forward to future improvements.


An API for the raw data that underlies this system would be extremely useful to academics. Databases like compustat and execucomp are expensive and lack some of the most interesting details in SEC documents. I have worked on trying to extract deep information in footnotes in financial statements (e.g. foreign cash holdings and option exercise tax shields) and found even Turk/Crowdflower couldn't handle the complexity. If they can figure out an algorithm to pull out such data, they will have both a great academic and private sector product.


If by the raw data you mean the actual filings with the SEC, then the raw data is available via ftp from the SEC itself.

http://sec.gov/edgar/searchedgar/ftpusers.htm

Parsing the edgar documents is a mixed bag. Many of the older filings and some of the more recent ones are in text rather than HTML. Finding footnotes in the HTML is probably not that bad but the issue is the lack of complete coverage where you miss the HTML footnote or the document is in text instead

I've worked on parsing the HTML tables for tables like Balance Sheet, Cash Flow, etc. It was problematic and I only got about 70% of the way there but I think a more complex rule base could get to 90%. The issue is that 90% isn't really good enough for many users.

I've heard that CapitalIQ/Thompson Reuters actually use Indian financial professionals to manually extract the info. This could be a good way to backfill/double check missing/bad values but I chose not to try that path. In the end, many of the potential customers will opt for paying a much higher price for a better brand and/or higher level processing like normalizing accounting standards.


I was really excited that they'd figured out a way to cull balance sheet data out of the filings. Doesn't appear so though, "XYZ Corp filed their annual statement today, you can view it here"


For historical data, this will be an issue.

Going forward, the SEC is requiring filers to make this process much easier

http://xbrl.sec.gov/ http://www.sec.gov/rules/final/2009/33-9002.pdf


I'd be interested to know how Google feels about automatically generating content from other content like this - at first glance it looks like a regular article, but after you read a few it's clear that they have some sort of dynamic template for the content. More legit than a content farm I suppose, but it's not necessarily high quality either.


Thanks for checking out MarketBrief. We just launched our financial news service, but we're constantly iterating on the quality and analysis of our articles, and our sights are set much higher than simply churning content.


Ironically, when I started reading this I thought you were poking fun at the generated language, because your post sounded like it to me.


I haven't had a chance to check out the site yet, but I can tell you from personal experience that putting these filings in a more usable form is definitely an area that is underexploited at the moment.

I often comb through filings for the super regional banks. The majority of my time is spent looking for a few key items and arranging the data into a useable format. If you are looking for more ideas on how this area could be improved, I'd be glad to fill out a survey or give you additional comments.


seems like a great product for a pain that most people have. i don't seen professional investment firms or bulge bracket banks would use this, however; at citi we use factset and capiq and even those services don't reveal everything and have errors sometimes. the analysts, at the end of the day, usually goes through the filings no matter what.

like other comments posted here, professional investors, on the other hand, generally already have another service they pay for, or proprietary data/information (for hedge funds or prop shops at least) that aggregates the specific data they are looking for.

not quite sure what the target audience of this is; if its investors with the intention of condensing data to just q/q and y/y operational performance, then the financial information technology space is already pretty crowded.


We do a great job at locating resources. This has created a new problem: we have too much information to wade through. I would now like to see some sort of effort to structure this information. You make the comment that the financial information technology space is already pretty crowded, but I still haven't seen anything that structures all of this financial information CHEAPLY.

EDIT: Hell, if I could assemble a team, I think the best way to attack this problem would be to build out the same functionality that Bloomberg provides for equities, provide it for free, and then build a retail brokerage firm around it. Give people the information to make investment decisions easily, and they will make those investment decisions. Just be there to capture the revenue generated by their trades.


thats true; i have yet to see structured financial data for cheap. although depending on the targeted audience, that 'structure' varies. equity analysts might be concerned about operational performance, top line growth, margins, and stock buyback programs. debt/credit analysts might be concerned about cash flows, revolver draws, commercial paper programs, etc. there are many niche markets and existing services out there may have already provided these structures (albeit not cheaply and they may be proprietary if at a firm).

building a retail brokerage firm around structured market information i think is harder than it sounds. unless the transaction costs for each trade reflect the value-add of having a 'free' service structure financial data. currently, transaction costs reflect the quality of investment reports prepared by (usually) CFA investment professionals. i'm not sure if the free structured investment information would yield a transaction fee material enough to warrant a sustainable business model.


I have thought about the targeted audience portion of your comment. That is fully addressable through the use of templates. My daily work is in the interest rate risk space for a bank. As part of keeping up on the space, I read the financials of other banks. There are certain things I look for in these financials. If there was a way to create a template that was automatically populated with the information I am looking for, then that would be an improvement over the method I currently use. With that said, I'm not a computer scientist. My programming skills are limited to the more mathematical side of finance. I don't know how difficult it would be to provide such a thing given the challenges of parsing natural language.

As for the pay wrapper, I will admit that operating a brokerage firm at a large enough scale and a low enough price to attract a significant portion of the market share while somehow turning a profit would certainly be a challenge; however, I am not ready to abandon this line of thinking yet.


This is great. Talk about finding a real need and solving it. Well done guys. Do you or does anyone know of something similar for licensing agreements? The ones we all sign without reading? That'd be great as well.


If they had an API, or a machine-readable format that contained only the data from each SEC filing, I would be very interested.


the SEC already makes filing data available in machine readable format - it's called XBRL. I don't think all companies are on it now, but there's a mandate that will force all companies to eventually do it


Not back to 1996 they don't (the first year of online SEC filings) which for those of us interested in doing panel studies we'd love to have. You can purchase it for ~$20,000 for a small institution, I'm not sure what an individual license would run.


It's a shame that one would have to pay such a large amount for such a data set. Let's not forget that the SEC exists largely to protect investors. Somehow, today's SEC framework/accounting rules seem to favor the filers of financial information rather than the users of financial information. This is a fundamental problem of the system today.


Shows how behind I am; thanks!


If they are automatically generating all the articles on the site that is quite impressive.


It is impressive but they definitely need some more work. The current top story is about how somebody called "Gendell Jeffrey L Et Al" sold a bunch of stock. The phrase "Gendell Jeffrey L Et Al" clearly refers to multiple people but their website is treating it like the name of a single person.


I worked on a similar project while an undergrad at KU. (http://www.oread.ku.edu/2008/october/20/fraank.shtml)

I like the idea, a lot. This seems to be well executed, but I wonder how much of a future this thing has with XBRL becoming a requirement. In 5 to 7 years there will be enough historical data tagged in XBRL for any analyst to perform as financial analysis.

Also, on the website I can search 400,000+ companies, but there only seems to be ~4900 filings on the dashboard. Doesn't add up to me.


As an avid investor I must say it looks like a generic investor news site (complete with ads). I'm sure the content is there but for my eyes It's a bit confusing (wall of small tidbits?).


A few years back I wrote a scraper for 13f and 13g filings. There was no standard format requirement at the time (is there now?) so it was an 80/20 hack with the remaining 20% completed via Mechanical Turk. Not sure what became of the project, but it seemed like there was a menupages-style opportunity to monetize the cleaned up data. Glad someone actually went ahead and did it!

To the MarketBrief guys: are you using anything cool (machine learning?) to clean up historical data?


Nice!

Is there any logic that notices when a number in the filing that isn't normally included in the summary is surprising, so should be included, or is the summary template static?


There's a ton of places that this can go. Just building on top of the functionality they've already enabled, this could become quite powerful very quickly.

Here's this raw data stream that's existed for years that hasn't seen any good tandems of algorithms+UI applied toward it. Big opportunity.

Watch Bloomberg, Reuters and Yahoo...they could be jousting for this in six months or less.


Bloomberg and Reuters already have competing services that cost tens of thousands of dollars each year for a subscription. If they bought this, it'd be to eliminate a competitor.


I hope that a competitor hangs around long enough to revolutionize this industry. The system does not serve retail investors well at the moment. This is a space with low hanging fruit.


Sorry for such a late reply, but I agree in full. I really loathe using the Bloomberg Terminal, and I can't believe it hasn't been done better yet. I'm hoping someone recognizes that low-hanging fruit.


I don't particularly have a problem with the Bloomberg Terminal; it's the price with which I have a problem. There is no other source currently available that provides the same information density as Bloomberg. The navigation within a terminal at first is a little complicated, but that is because the quantity of information you are dealing with is amazing. With that said, I would be interested in reading your thoughts on what could be improved.


Care to elaborate what you loathe about the terminal?


I first saw this post on my mobile and didn't click on the story link. Taking a look at the site, I was expecting something really user friendly, with big letters and numbers with explanations alongside. Perhaps the mint.com of public finance. To me, this site still looks like market watch or bloomberg or any other finance site.


My first thought on seeing their site is that the search box, and email alert box look just like sidebar ads for some reason. I think it's because it looks like an image, and not an html form.

I wouldn't have used the search if I had visited from Google Finance.


I worked for a company in 2000 that was working on a very similar concept (if not exact) called Newsgrade. I liked the idea then and I like the idea now. I hope they are successful.


fool.com has been generating this type of templated natural language "analysis" of stocks for a while.

I expect most of the benefit to an expert would be in the comprehension of public documents into some standard, structured from, and not the cute (perhaps helpful to laymen) English presentation.


Now this, is a great concept. Congrats on having the best idea I've seen so far this year guy.


Sigh. Nobody knows what "obtuse" means anymore.


Hate to spoil a good melodramatic sigh of intellectual superiority, but "difficult to understand" is an accepted secondary definition of "obtuse" in many dictionaries.


[citation needed]


That's cute, since it was your unsubstantiated assertion in the first place that the word was being misused.

Tip: when unsure about the definition of a word, try googling define and the word in question. You will get results from a number of sources.


I did look it up, even before posting the first time.

I see that Google does list the definition you suggest. But it does so without attribution, as far as I can see. Dictionary.com (the place I had been looking), which takes definitions from several dictionaries, with attribution, does not offer the meaning in question.

You said "many dictionaries" would list this meaning. I think your claim remains unsubstantiated.


Clarification: the Google page for "obtuse" does list a source for the meaning "hard to understand", but not only is that source not a dictionary, it offers the meaning only to say that using the word with that meaning "is not considered good style".


So who is the target audience? Finance pros can already read SEC filings.

SEO is the only obvious source of traffic and money I can think of.


Make a team of finance analysts who all cost $200+ an hour 10% more efficient. Collect money hats.

Seriously, this is like "Bug tracking? You've got high priced professionals, why can't they just use Excel or sticky notes?"


I'm only a single data point, but I'd rather see data like this in a non-prose format, preferably in tables -- the way SEC documents are served up from EDGAR.


Lawyers? Retail investors?

Even for finance pros who currently parse the raw filings, I can see how having an auto-generated human-readable summary that pulled out the key points would save enough time to be worth spending money on.


So here are the two versions:

http://marketbrief.com/goog/10q/quarterly-report/2011/5/10/7...

http://sec.gov/Archives/edgar/data/1288776/00011931251113442...

They look pretty identical to me. Plus sec.gov has the June 30th report that MarketBrief simply doesn't have.

http://www.sec.gov/Archives/edgar/data/1288776/0001193125111...

Look: http://marketbrief.com/goog/financials

The last one is from May 10th.

And this is Google we're talking about, a major player. Imagine what kind of things they may have missing for smaller companies.


Their pro package is cheap for an analyst data service.


Doesn't answer the question. Any hedge fund or large trading desk will have prop tools that are probably 100X better as they have unlimited $ to go after more data. Is the hobbyist going to be subscribing... doubtful. Cool idea, but small market imo.


I should have been more clear that finance pros do not all already read SEC filings. Some pay for news services this could replace. Some read the filings manually and would pay a small amount of money to not bother with edgar. They won't get the unlimited-budget firms but they could get a branch manager to replace a $3k/year newsletter with this.


Hey Michael, would you contact me please, I'd like to chat with you more about this idea. My email is in my profile. Thanks


Sent.


Build out a system that organizes all of this financial information. Make it freely available. Build a pay service around it: top notch, cheap retail brokerage.

What other services could you get investors to pay for if they had all the information necessary to make thier own investment decisions in an easy to use format?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: