Developing Artificial Intelligence (AI) and Advanced Analytics are the primary missions for LexisNexis and we are seeing the acceleration of amazing AI work being developed in the Legal space. We are looking for an experienced data scientist who is devoted to building the most accurate legal data. If you have modified and built your own models to improve upon the out of the box results, then we are looking for you!


Accuracy of our primary data entities such as courts, judges, law firms, attorneys and companies is critical to our success and we need new and innovative ways to extract legal knowledge and to pinpoint specific data information. The Principal Data Scientist will lead a team in defining, deploying and maintaining generic extraction models to integrate with our existing processing of web pages that form the core of some of our authorities. The extraction of data will come from a range of informational structures like paragraphs, lists and sentence fragments. The Principal Data Scientist will also be responsible for overseeing machine learning optimization models during the data processing as well as detecting potential duplicates in large datasets. The successful candidate must be able to quickly familiarize themselves with LexisNexis' diverse data resources and have the ability to leverage the latest in NLP technologies to define specific models and logic for legal and corporate entities. The ideal candidate has built and maintained large-scale production NLP and machine learning models and analyzed the output for accuracy and completeness. They also have experience leading and tutoring other team members on cutting edge NLP processes and model building.


>Develop innovative strategies for extracting information from web pages for legal and corporate entities

>Define ML algorithms (BERT, ELMo, GPT, etc.), API's, and open-source methods and be able to quickly evaluate alternatives.

>Manage, validate and deploy custom models for extraction and machine learning models

>Integrate models into existing extraction and processing logic

>Create machine learning models to optimize decision trees

>Create processes to quickly identify potential duplicates in large entity sets

>Writing queries and reports to confirm extraction models and processing models.

>Perform cost analysis to determine optimal models that balance model success with processing costs

>Must document all models and provide APIs to quickly test and validate models and model changes


>5+ years' experience using Machine Learning and associated packages like scikit-learn, pandas, Numpy.

>Proficiency training large scale models in at least one modern deep learning engine such as Tensorflow, Keras, PyTorch/Torch, MXNet, Caffe/Caffe2

>3+ years' experience using NLP tools and methods such as OpenNLP, Stanford NLP, spaCy, Gensim etc.

>PHD in NLP and/or Machine Learning

>Teaching experience or Conference speaking engagements is required

>5+ years experience with AWS products

>Must have hands on large scale model deployment experience

>Must include intimate familiarity with machine learning models and decision trees

>Must include intimate familiarity with SQL query language

>Expert Python programmer

>AWS Certification is a plus

>Ability to provide guidance and training to engineers, system engineers and other team members

>Documentation skills for processes and procedures

