Pull to refresh
104.49

GitHub *

Web-service for hosting and development of IT projects

Show first
Rating limit
Level of difficulty

C# or Java? TypeScript or JavaScript? Machine learning based classification of programming languages

Reading time6 min
Views1.6K
GitHub hosts over 300 programming languages—from commonly used languages such as Python, Java, and Javascript to esoteric languages such as Befunge, only known to very small communities.


Figure 1: Top 10 programming languages hosted by GitHub by repository count 

One of the necessary challenges that GitHub faces is to be able to recognize these different languages. When some code is pushed to a repository, it’s important to recognize the type of code that was added for the purposes of search, security vulnerability alerting, and syntax highlighting—and to show the repository’s content distribution to users.

Linguist is the tool we currently use to detect coding languages at GitHub. Linguist a Ruby-based application that uses various strategies for language detection, leveraging naming conventions and file extensions and also taking into account Vim or Emacs modelines, as well as the content at the top of the file (shebang). Linguist handles language disambiguation via heuristics and, failing that, via a Naive Bayes classifier trained on a small sample of data. 

Although Linguist does a good job making file-level language predictions (84% accuracy), its performance declines considerably when files use unexpected naming conventions and, crucially, when a file extension is not provided. This renders Linguist unsuitable for content such as GitHub Gists or code snippets within README’s, issues, and pull requests.

In order to make language detection more robust and maintainable in the long run, we developed a machine learning classifier named OctoLingua based on an Artificial Neural Network (ANN) architecture which can handle language predictions in tricky scenarios. The current version of the model is able to make predictions for the top 50 languages hosted by GitHub and surpasses Linguist in accuracy and performance.
Read more →

A drawing bot for realizing everyday scenes and even stories

Reading time6 min
Views1.6K

Drawing bot


If you were asked to draw a picture of several people in ski gear, standing in the snow, chances are you’d start with an outline of three or four people reasonably positioned in the center of the canvas, then sketch in the skis under their feet. Though it was not specified, you might decide to add a backpack to each of the skiers to jibe with expectations of what skiers would be sporting. Finally, you’d carefully fill in the details, perhaps painting their clothes blue, scarves pink, all against a white background, rendering these people more realistic and ensuring that their surroundings match the description. Finally, to make the scene more vivid, you might even sketch in some brown stones protruding through the snow to suggest that these skiers are in the mountains.


Now there’s a bot that can do all that.

Read more →

Introducing GitHub Package Registry

Reading time2 min
Views2.3K

Today, we’re excited to introduce GitHub Package Registry, a package management service that makes it easy to publish public or private packages next to your source code.


GitHub Package Registry is fully integrated with GitHub, so you can use the same search, browsing, and management tools to find and publish packages as you do for your repositories. You can also use the same user and team permissions to manage code and packages together. GitHub Package Registry provides fast, reliable downloads backed by GitHub’s global CDN. And it supports familiar package management tools: JavaScript (npm), Java (Maven), Ruby (RubyGems), .NET (NuGet), and Docker images, with more to come.


You can try GitHub Package Registry today in limited beta. It will always be free to use for open source—more pricing details will be announced soon.


Sign up for the beta


Announcing GitHub Package Registry

Read more →

Announcing the Open Sourcing of Windows Calculator

Reading time2 min
Views1K

Today, we’re excited to announce that we are open sourcing Windows Calculator on GitHub under the MIT License. This includes the source code, build system, unit tests, and product roadmap. Our goal is to build an even better user experience in partnership with the community. We are encouraging your fresh perspectives and increased participation to help define the future of Calculator.


Image of Windows Calculator

Read more →

Configure the development environment for learning HTML, CSS, PHP in Windows

Reading time8 min
Views3.7K

Configure the development environment for learning HTML, CSS, PHP in Windows.


(article is published on linkedin.com (Alexander Panov) )


web - developer PHPStorm and Vagrant


Beginning programmers always ask the question, where to start learning about creating websites? Search on the Internet gives a lot of various information, but unfortunately only more confuses novice developers. Having more than 5 years of teaching experience, I have seen many times how students, making breaks in computer games and deciding to become web developers, tried to customize development environments. They began to sort out the programs for layout. They grabbed Sublime Text, Notepad ++, Visual Studio Code, NetBeans or PHPStorm. We tried to install on our local computer WAMP (stands for Windows, Apache, MySQL and PHP) platforms, such as Denver, Xampp, Open Server and many others. But since there is no systematic knowledge in the head of a novice developer, there is no experience of customization, it was rare for some of them to set up a convenient development environment, and whoever got it, understood that it became more difficult to play, since many additional programs, smoothness in the game.


This article was born because of explaining for many years to each student how to do everything correctly and conveniently.


Once again I want to say that we will set up a working environment in the Windows system to study the creation of sites in HTML, CSS and their links to PHP. We will call this computer working, so that beginning web developers get used to the idea that their computer is their source of income.


Begin

Читать дальше →
2