Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Disclaimer: I am not defending or denying the claims in the paper. I have a philosophical interest in time.[1]

The difference I think is that the series in a genome is lexicographic rather than temporal: i.e. which end of a particular strand you read from is irrelevant. On the other hand, a time series has an independent ordering. By analogy, a time series is a directed graph a genome is undirected.

That is the algorithm for finding a subsequence in a time series can be used for finding a subsequence in a genome. But the meaning of a subsequence is different. It's:

  A - G - A - C
versus

  C-> A -> G -> A.
Time series data contains implication and causality. Any claim about time series data is implicitly a claim about causality: e.g. we might claim a time series appears to be random. We don't really talk about a genome's amino acid sequence being random because the causality (other than the trivial case of analogues) lies outside the sequence - the reason for

  T - A - C
is assumed to lie outside of T - A - C.

[1]: http://plato.stanford.edu/entries/kant-hume-causality/#TimDe...




This is pretty much the main idea behind another paper that Eamonn wrote [0]. The idea is that you can normalize and discretize timeseries data into a lecicographical representation, then perform all sorts of interests analyses on it. For example, searching for common subsequences can be done via regular expressions.

[0] http://www.cs.ucr.edu/~eamonn/iSAX.pdf




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: