I use emacs for email and org capture to keep track of my open loops. Recently I have been trying to combine the two and capture tasks that link to emails. I wanted to tag my capture items based on the inbox they fall into, :@home: for my personal and :@school for my school emails. I cobbled together this function based upon different stackexchange posts and figured I would share it here for anyone who wanted to do the same.
One of the problems in Natural Language Processing (and a problem I'm facing at work) is how to cluster documents into groups based on their contents. There are two broad approaches to solving the document clustering problem, supervised and unsupervised machine learning. Supervised machine learning relies on labeled data and unsupervised learning tries to categorize the data without any prior labels. These two methods both have their ups and downs that I will not go into here.
Easy code is easy to compile and run. That has and always will be true. However, once the code you write spans across multiple classes, files, or even packages it can be hard to properly test, compile, and release this software. Continuous integration (CI) tries to solve this problem. By defining a pipeline of actions to take your code from source to product that run the same way every time.
Email management, when heavily abstracted, is simple. To start reading email offline on your own PC you need three programs:
Sync email to/from IMAP server (mbsync)
Manage email on your PC (mu and mu4e)
Send email (msmtp)
Once these three parts are working together then email can be downloaded, viewed, and replied to. Getting these programs working is no easy task, however.
Introduction Throughout the past semester I have been working on my senior capstone project for my CS undergraduate. The project is to create Emoji summaries for sentences and one of the integral parts of this algorithm is separating a sentence into a sequence of n-grams that represent it. In the initial algorithm, I took a naive approach of generating every single combination of n-grams, summarizing them all, and then returning the summary with the highest result.
Motivation My senior capstone project for my computer science degree is research focused on summarizing sentences. My group mate and I decided to try and accomplish this by converting sentences into Emoji. We think that this will produce a more information-dense string. This problem is rather similar to a plethora of different problems in computer science and other, unrelated, domains. Within computer science, it is adjacent to the Emoji prediction and Emoji embedding problems.
Term Frequency-Inverse Document Frequency (commonly abbreviated as TF-IDF) is a formula commonly used in Natural Language Processing (NLP) to determine the relative importance of a word. The formula is comprised of two sub-formulas, term frequency and inverse document frequency. The basic assumption of this formula is that if a word appears more in one document and less in every other document in the corpus then it is very important to that specific document.