Looking for input to project idea for worldwide data collect

pascalwhoop Wed Nov 08, 2017 3:26 pm

Hello to this great community,
I would like to first introduce myself so that people can size me up. I am a 25 year old computer scientist and software engineer, but I do not have MS. I just feel like it is important to disclose this in such a community, where a certain feeling of unity might not welcome an outsider like me.

That being said, I do have a loved one with MS and I am looking for ways to apply my competencies to help her and others to live with this disease.

The project on which I seek feedback

Building a large, community driven dataset about Multiple Sclerosis cases to analyze using new Machine Learning technologies to find both causes and treatments.

My motivation
I have been trying to find decent data about multiple sclerosis patients, statistics and overall valuable information that one could use to apply modern techniques of machine learning but to no prevail. Surely, there have been studies of various forms and I have also found the summary of all polls in this forum. But there does not seem to be a publicly available data-set of a sufficient scale that I was able to find.

Why I think machine learning can help
The news are full with the topic. AI here, self-driving cars there etc etc. Let me give you my 3 points that I think match the problem of "finding out why" MS is a thing
  • MS must have a cause. It is just not a simple one. There could be dozens of factors interrelated that cause MS. Neural Networks "think" in n-dimensional space (often several thousands of dimensions) and can therefore find out correlations and even causality links that we "dull" humans have trouble wrapping our head around.
  • The recent explosion in Machine Learning (a sub field of AI) is caused by 3 things: Data availability, processing power and algorithm maturity. Yet there aren't many researches going on regarding Machine Learning and MS. Why? I believe it is because of a lack of datasets that are available.
  • Most research in medicine seems to be driven by the professionals. Doctors and Professors come up with research projects and perform studies. On the other hand, the internet has seen large communities rise and come up with novel ways to work together to achieve a common goal. There are thousands and thousands of people suffering from MS. This forum and other places around the internet are meeting places for you all. But I am sure, the combined force of intellect that is spread throughout the community can achieve more than just observe corporations and practitioners as they proceed with the studies. More importantly though, all these snippets of data need to become aligned and unified. Otherwise it is like trying to see a picture by looking at a random piece of a puzzle every morning.

What would be needed

Luckily, the technology companies of today offer many tools to create a pipeline as well as a community driven way of collecting data. The concept is made up of a few parts:
  • An information platform. This could be a website, a Wiki or other common tools that explain the whole endeavor
  • A unique (but anonymous) identifier for each participant. All data must come through the participants not by institutions. Maybe this can change later on but the core idea is one unique ID per patient so that all sort of different factors can be tracked.
  • A pipeline of questions / hypotheses pairs that is run by the community. This includes posting + voting to determine a prioritized list and ensure high quality queries.. A simple example would be: "Is MS correlated to latitude?" with the correlating question of "Where do you live?" or "Is MS correlated with lactose?" correlated with questions about lactose consumption behavior of the past.
  • All queries would be public to be answered by all participants and then matched to the IDs and stored. Ideally we would slowly move towards hundreds or thousands of data points per person.
  • The Database would be accessible to any research group that applies to access it as well as statistics and analysis tools would be available to any member.
Who would be needed
  • Software Developers: Web Technologies will be primary. Some Backend Tech as well. Mostly though, we can make use of existing cloud services.
  • Statisticians and Data Scientists for the cleaning, structuring and analysis of the data
  • a large community of participants
  • Answers by healthy people. Ideally every participant can answer questions in two modes: "Me" and "Person X", helping to build a database on both healthy and diagnosed. Ideal individuals would be spouses, relatives etc.
  • Translation help. Mostly for Website / App Translations


I would greatly appreciate feedback on these ideas of mine. I cannot offer payment or promise anything. But I know the Open Source communities of the internet include many people willing to work on software and problems for hobbies and other reasons. I am sure some will also have loved ones suffering of MS or suffering themselves.

  • I am stupid and there is such a thing already, please just point me in the right direction and I will not bother anyone any further
  • This has no chance, please tell me why
  • you have improvement ideas or comments, please share them with me!
Thank you for the attention and take care everyone!


- State of Neural Networks - High Level Overview as Introduction https://www.youtube.com/watch?v=mFYM9j8bGtg
- Neural Networks developing relational capabilities that surpass humans https://arxiv.org/pdf/1706.01427.pdf
- Many interesting summaries about developments in AI https://www.youtube.com/user/keeroyz
- Data For Democracy, a Community of Data Scientists and wanting to be ones trying to solve problems http://datafordemocracy.org/

Solution Techs:
- Firebase (database)
- Surveymonkey (Questioning Frontend)
- Angular / D3.js / Material (App)
- Discord (live talk/chat platform)
- (if it becomes real) this forum: Forum
- Github (all code)
- data.world (potential publishing for other researchers)
- Facebook, Youtube, News (spreading the word)
Re: Looking for input to project idea for worldwide data col

David1949 Wed Nov 08, 2017 7:30 pm

Seems like a good idea to me. I did something similar many years ago, although it was probably not as sophisticated as what you propose. The only things that correlated with MS was latitude and saturated fat in the diet.

I think MS is at least two diseases; RRMS and PPMS. Some recent studies suggest it may be 4 different diseases. That complicates things a bit because then you are looking for 4 different causes.

Good luck! I think this is definitely worthwhile.

