CAE Web Hosting Architecture Overview

This document attempts to describe the basic architecture of CAE's latest general purpose web hosting infrastructure.

This document attempts to describe the basic architecture of CAE's latest general purpose web hosting infrastructure in case it affects decisions users might make while setting up their vhosts and "off the shelf" web apps. As many of these topics are interrelated it's somewhat difficult to provide a decent overview without referencing other parts of the document or delving into more details than are perhaps relevant/desirable for all cases. However, please try and read the full document before contacting us with questions.

Terminology

To hopefully help to clarify some points this document will use the following terminology:

  • website or vhost to mean a single http://VHOST_NAME.DOMAIN/ container.
  • webapp to mean a particular set of code housed in a website. For instance, wordpress, drupal, and joomla are considered "off-the-shelf" webapps. As an aside, please note that it is up to the vhost owner and editing group to keep these webapps up to date with their respective security patches.
  • webserver to mean the server software (in this case Apache) that responds to client requests for a vhost's content.
  • server to mean a machine hosting one or more webservers.
  • editing group to mean the editing group associated with a vhost as specified in the vhost's control panel on https://my.cae.wisc.edu/tools/account/vhosts/.
  • vhost control panel to mean the tools referenced at the URL given above.

Server Overview

This section attempts to give a brief outline of some of the server components involved in the CAE web hosting environment.

In general we provide a LAMP stack with some caveats.

  • First and foremost, for scalability and reliability reasons, the LAMP stack is not hosted on a single machine. In other words, referencing localhost often will not work when attempting to use SFTP, SSH, or MySQL within a webapp. Please see below for the various access methods for these separate services.
  • A corallary of this is that each website does not have it's own server machine. Instead several webservers are run on single server of a particular "type" (eg: PHP, HTML, Perl, etc.).
  • Websites sharing the same editing group and type and domain are combined into a single webserver for scalability reasons. For example, if VHOST_NAME_1.DOMAIN and VHOST_NAME_2.DOMAIN are both assigned the type php5 and editing group bpk-test, then they will be served by the same webserver.
  • Websites with different domains are separated out to different webservers. This is so that we can provide basic SSL/HTTPS functionality for the majority of our commonly served domains so that developers can help protect their user's sensitive data (eg: webapp username/passwords). For all other domains there is a fake * certificate currently inplace which browsers will issue a warning on. If you'd like your non-wisc.edu domain to have a real certificate, please contact us.
  • For load balancing or maintenance reasons the webserver may move or replicate to another server from time to time. This is part of the reason why direct logon to "your website's server" isn't supported - the concept of "your website's server" doesn't really exist. This process is transparent to users and normally done automatically at night.
  • For security reasons, each webserver runs as it's own user to prevent a compromised webapp from disrupting other sites. The user is automatically created and managed by CAE. It's named web-EDITING_GROUP, and by default given an email alias of group/EDITING_GROUP@cae.wisc.edu so that any email sent to (or bounced back to) web-EDITING_GROUP@cae.wisc.edu will reach the entire group. If you wish to change the target of this alias, please contact us.
  • For scalability reasons, PHP type vhosts do not have all potentially supported PHP modules (eg: ffmpeg) loaded by default. To do so would be a performance drain on sites that didn't need all of that support and require an enormous amount of memory. If you find that a module you need is missing, please use the vhost control panel tools. If you find that one that you'd like to use isn't listed there, please let us know.
  • For security reasons we have egress firewall rules in place to prevent websites from accessing code from unknown sources (this is a common webapp exploit technique). There is a substantial whitelist for "trusted" hosts already in place (eg: google, wordpress, facebook, etc.), but if you find that you are having problems making outbound connections from your scripts, please let us know.
  • For security and performance reasons, each backend webserver is proxied through a frontend webserver (eg: webserver-N.DOMAIN) that also runs ModSecurity, an application layer firewall. If you find that this is getting in your way, please contact us with a specific RequestID and error message. Here is a link for an example error message. Note that the exact details of the message may change depending upon whether or not you're coming from a campus network address.
  • Each backend website is generally CNAMEd to one of these proxy webservers.
  • Websites are served using both IPv4 and IPv6 so that users without IPv4 access can still view the site and so that we're prepared as the rest of the world transitions to IPv6.

Filesystem Layout

The typical filesystem layout for all of our Linux servers and clients (eg: best-tux.cae.wisc.edu or webshell.cae.wisc.edu) has the root of each vhost's servable content at /home/vhosts/VHOST_NAME.DOMAIN/html. This is content that the webserver will serve to clients. So, a client requesting http://VHOST_NAME.DOMAIN/foo_bar_baz.html will be (in the absense of any Apache redirects) resolved to /home/vhosts/VHOST_NAME.DOMAIN/html/foo_bar_baz.html.

Above this in the directory tree are usually two other directories that we automatically create:

  • /home/vhosts/VHOST_NAME.DOMAIN/etc
    This is usually a good place to store includes that contain db login information or authentication files generated by htpasswd since then they can't be directly served to malicious clients by the webserver, but can still be used by your scripts or .htaccess files.
  • /home/vhosts/VHOST_NAME.DOMAIN/data
    This is usually a good place to keep code libraries that don't directly generate content for your website. You could also use this area to keep uploaded files that you don't intend to serve back through your website.

Absolute vs. Relative Paths

Please note the leading / (slash) in each of these paths. That means that they are absolute paths. You can also use relative paths (any path not starting with a /) in your code, however please be aware that in your website's executing environment /home/vhosts/VHOST_NAME.DOMAIN/html will be it's current working directory, so all relative paths will be relative to that location.

Shared vs. Local

Also note that, as outlined in more detail below, although the /home/vhosts/... area is shared for all servers, other areas of the individual server's filesystem are not. For example, /tmp, which is a common area for applications to dump uploaded files, is different for each server. Thus, depending upon your application you may to specify your upload path in the shared /home/vhosts/... area for it to be visible everywhere.

Filesystem Access

The following are typical access methods to the shared /home/vhosts/ content:

  • SFTP
    A restricted SFTP service for managing vhost content is available for users with the WEBEDIT acl.
    For this service, all paths are of the form /home/vhosts/... as specified above.

    Some SFTP clients may expect URIs of the form sftp://user@webedit.cae.wisc.edu. In this case paths are added on to the end of this URI as before. For instance, sftp://user@webedit.cae.wisc.edu/home/vhosts/VHOST_NAME.DOMAIN.

    Interactive SFTP sessions will be placed in the /home/vhosts directory by default.

    Please see https://kb.wisc.edu/cae/page.php?id=6871 for more details.

  • Shell

    Since logons to the actual server that hosts a given vhost's webserver are not allowed, we provide the following methods for shell access to the shared vhost content.

    • For users with the UNIX-LAB acl vhost content can be managed from any tux machine (eg: best-tux.cae.wisc.edu) at /home/vhosts/. However, it should be noted that the lab machines are a different architecture and have different packages available to them than the servers.
    • A simplified secure shell (eg: ssh) environment similar to the one provided by a real webserver is provided for users with both the WEBEDIT and CRONTAB acls at webshell.cae.wisc.edu.

      The server has no home directories, but does have a superset of packages that the real servers have and allow shell access to perform vhost management functions such as setfacl for managing permissions, crontab for setting up automatic tasks, vim for editing files, php for running scripts, mysql for interacting with databases, and other website specific tools (eg: drush), etc. Please note that we do not provide drush or other website specific tools.

  • SMB/CIFS
    An SMB/CIFS service for managing vhost content is available for users with the WEBEDIT acl.
    • OS X (and other Unix variants including personal Linux installs):
      In this case the SMB URI smb://webedit.cae.wisc.edu/VHOST_NAME.DOMAIN is equivalent to the shared path /home/vhosts/VHOST_NAME.DOMAIN on the servers.
    • Windows
      In this case the SMB URI \\webedit.cae.wisc.edu\VHOST_NAME.DOMAIN is equivalent to the shared path /home/vhosts/VHOST_NAME.DOMAIN on the servers.

    Please see https://kb.wisc.edu/cae/page.php?id=6955 for more details.

  • MySQL

    Though not actually filesystem related, any MySQL databases you may have created for your vhost using the tools on its vhost control panel page, will be accessible using the hostname mysql-general.cae.wisc.edu or mysql.cae.wisc.edu (they are currently synonyms), not localhost.

    Please also note that MySQL usernames and passwords are distinct from your CAE credentials.

    Managing your databases can be done using some of the following CAE provided tools:

    You can also use any number of other MySQL compatible clients to interact with the data and schemas in your database, such as, but not limited to:

    • MySQL CLI (available on the best-tux.cae.wisc.edu Linux lab machines)
    • MySQL Workbench (formerly MySQL Admin or MySQL GUI Tools, and also available on the Linux lab machines)
    • TOra (also available on the Linux lab machines)

Please note that there is no direct filesystem or database access available for CAE managed Wordpress instances.

Filesystem Permissions

Fileystem permissions control which users, including the one running your webserver, have access to read, write, and execute the files and directories in your website's path.

This is done using a combination of unix mode bits (especially the group sticky bit on directories) and (default) posix acls so that your editing group always has read/write permissions to the content in your website and your website user always has read access to them (else, it would not be able to serve your website's content).

For various reasons (bad umask, incorrect setfacl, chown, chgrp, chmod, etc.) these permissions can become incorrect. To fix them you can use the "Change Permissions" button on the vhost control panel pages.

Website Write Permission

For your website to be able to accept uploaded data, or write to its webapp's configuration data, etc. the user that runs the webserver that serves it needs to be able to have access to perform these tasks.

To set this up you can do one of the following:

  • To set it up for the entire vhost, use the "Change Permissions" tools referenced above and select the "Allow the webserver to write to files." checkbox.
  • To set it up for a particular directory tree use something like the following:
    # setfacl -R -m u:web-EDITING_GROUP:rwX /home/vhosts/VHOST_NAME.DOMAIN/html/some_dir_path
    # setfacl -R -m d:u:web-EDITING_GROUP:rwX /home/vhosts/VHOST_NAME.DOMAIN/html/some_dir_path
    Alternatively, you should be able to use the Windows Properties Security dialog when accessing the data via SMB.



Keywords:nwwo web hosting vhost vhosts vhosting apache lamp   Doc ID:21440
Owner:Brian K.Group:Computer-Aided Engineering
Created:2011-11-23 13:29 CDTUpdated:2016-09-26 09:00 CDT
Sites:Computer-Aided Engineering
Feedback:  3   0