WinkNLP – a journey to open source
Check Reference
– In this talk, Graype co-founder, Sanjaya, will share:
- our transition from working with closed source to open source;
- the journey in developing winkNLP, our developer-friendly, JavaScript library for Natural Language Processing (NLP);
- the challenges we have mitigated and continue to address as the library grows and becomes more robust.
– Background: our modest start with small packages — mostly around NLP, and what encouraged us to take a plunge to develop a comprehensive NLP tool:
- User feedback and community encouragement
- Application of our packages by a range of projects like Social Analyzer and Trustroots, among others
- Community role in transitioning licence to MIT from GPL
- Our focus on standards, quality and documentation
– Beginning: in a landscape dominated by NLTK, spaCy and coreNLP, we began developing an integrated NLP product in late 2018 with the following objectives:
- light weight
- developer friendly
- balance between performance and accuracy
- support for pure browser side and server side
– Journey: Sanjaya will highlight how with a team of 2½ people we met our objectives and addressed the roadblocks we faced:
- the difficulty in finding labelled data with permissive licences for training models;
- and our alternate approach, where we:
- created language models using hierarchical finite state machines
- used limited open source data and complemented them with manual efforts
- developing a declarative syntax to make winkNLP developer friendly
- meeting performance and accuracy benchmarks
- outcome — quick demo on web browser
- learning from user feedback
– Finally, Sanjaya will talk about where we are today and headed next, including our:
- Current challenges: sustaining an open-source project, and keeping pace with a fast-changing technological landscape; and
- Future plans: to extend winkNLP to more languages that use the Latin script, as well as Indian languages; and our efforts to nurture a community around the project.